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Preface 


This text is intended as an introduction to elementary probability theory and stochastic 
processes. It is particularly well suited for those wanting to see how probability theory 
can be applied to the study of phenomena in fields such as engineering, computer sci¬ 
ence, management science, the physical and social sciences, and operations research. 

It is generally felt that there are two approaches to the study of probability theory. 
One approach is heuristic and nonrigorous and attempts to develop in the student an 
intuitive feel for the subject that enables him or her to “think probabilistically.” The 
other approach attempts a rigorous development of probability by using the tools of 
measure theory. It is the first approach that is employed in this text. However, because 
it is extremely important in both understanding and applying probability theory to be 
able to “think probabilistically,” this text should also be useful to students interested 
primarily in the second approach. 

New to This Edition 

The tenth edition includes new text material, examples, and exercises chosen not only 
for their inherent interest and applicability but also for their usefulness in strengthen¬ 
ing the reader’s probabilistic knowledge and intuition. The new text material includes 
Section 2.7, which builds on the inclusion/exclusion identity to find the distribution 
of the number of events that occur; and Section 3.6.6 on left skip free random walks, 
which can be used to model the fortunes of an investor (or gambler) who always 
invests 1 and then receives a nonnegative integral return. Section 4.2 has additional 
material on Markov chains that shows how to modify a given chain when trying to 
determine such things as the probability that the chain ever enters a given class of 
states by some time, or the conditional distribution of the state at some time given that 
the class has never been entered. A new remark in Section 7.2 shows that results from 
the classical insurance ruin model also hold in other important ruin models. There is 
new material on exponential queueing models, including, in Section 2.2, a determina¬ 
tion of the mean and variance of the number of lost customers in a busy period of a 
finite capacity queue, as well as the new Section 8.3.3 on birth and death queueing 
models. Section 11.8.2 gives a new approach that can be used to simulate the exact 
stationary distribution of a Markov chain that satisfies a certain property. 

Among the newly added examples are 1.11, which is concerned with a multiple 
player gambling problem; 3.20, which finds the variance in the matching rounds 
problem; 3.30, which deals with the characteristics of a random selection from a 
population; and 4.25, which deals with the stationary distribution of a Markov chain. 
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Course 

Ideally, this text would be used in a one-year course in probability models. Other 
possible courses would be a one-semester course in introductory probability theory 
(involving Chapters 1-3 and parts of others) or a course in elementary stochastic 
processes. The textbook is designed to be flexible enough to be used in a variety of 
possible courses. For example, 1 have used Chapters 5 and 8, with smatterings from 
Chapters 4 and 6, as the basis of an introductory course in queueing theory. 

Examples and Exercises 

Many examples are worked out throughout the text, and there are also a large number 
of exercises to be solved by students. More than 100 of these exercises have been 
starred and their solutions provided at the end of the text. These starred problems can 
be used for independent study and test preparation. An Instructor’s Manual, contain¬ 
ing solutions to all exercises, is available free to instructors who adopt the book for 
class. 

Organization 

Chapters 1 and 2 deal with basic ideas of probability theory. In Chapter 1 an axiomatic 
framework is presented, while in Chapter 2 the important concept of a random vari¬ 
able is introduced. Section 2.6.1 gives a simple derivation of the joint distribution of 
the sample mean and sample variance of a normal data sample. 

Chapter 3 is concerned with the subject matter of conditional probability and con¬ 
ditional expectation. “Conditioning” is one of the key tools of probability theory, and 
it is stressed throughout the book. When properly used, conditioning often enables us 
to easily solve problems that at first glance seem quite difficult. The final section of 
this chapter presents applications to (1) a computer list problem, (2) a random graph, 
and (3) the Polya urn model and its relation to the Bose-Einstein distribution. Section 
3.6.5 presents ^-record values and the surprising Ignatov’s theorem. 

In Chapter 4 we come into contact with our first random, or stochastic, process, 
known as a Markov chain, which is widely applicable to the study of many real-world 
phenomena. Applications to genetics and production processes are presented. The 
concept of time reversibility is introduced and its usefulness illustrated. Section 4.5.3 
presents an analysis, based on random walk theory, of a probabilistic algorithm for 
the satisfiability problem. Section 4.6 deals with the mean times spent in transient 
states by a Markov chain. Section 4.9 introduces Markov chain Monte Carlo methods. 
In the final section we consider a model for optimally making decisions known as a 
Markovian decision process. 

In Chapter 5 we are concerned with a type of stochastic process known as a count¬ 
ing process. In particular, we study a kind of counting process known as a Poisson 
process. The intimate relationship between this process and the exponential distribu¬ 
tion is discussed. New derivations for the Poisson and nonhomogeneous Poisson 
processes are discussed. Examples relating to analyzing greedy algorithms, minimiz¬ 
ing highway encounters, collecting coupons, and tracking the AIDS virus, as well as 
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material on compound Poisson processes, are included in this chapter. Section 5.2.4 
gives a simple derivation of the convolution of exponential random variables. 

Chapter 6 considers Markov chains in continuous time with an emphasis on birth 
and death models. Time reversibility is shown to be a useful concept, as it is in the 
study of discrete-time Markov chains. Section 6.7 presents the computationally 
important technique of uniformization. 

Chapter 7, the renewal theory chapter, is concerned with a type of counting pro¬ 
cess more general than the Poisson. By making use of renewal reward processes, 
limiting results are obtained and applied to various fields. Section 7.9 presents new 
results concerning the distribution of time until a certain pattern occurs when a 
sequence of independent and identically distributed random variables is observed. 
In Section 7.9.1, we show how renewal theory can be used to derive both the mean 
and the variance of the length of time until a specified pattern appears, as well as 
the mean time until one of a finite number of specified patterns appears. In Section 
7.9.2, we suppose that the random variables are equally likely to take on any of m 
possible values, and compute an expression for the mean time until a run of m dis¬ 
tinct values occurs. In Section 7.9.3, we suppose the random variables are continuous 
and derive an expression for the mean time until a run of m consecutive increasing 
values occurs. 

Chapter 8 deals with queueing, or waiting line, theory. After some preliminaries 
dealing with basic cost identities and types of limiting probabilities, we consider 
exponential queueing models and show how such models can be analyzed. Included 
in the models we study is the important class known as a network of queues. We 
then study models in which some of the distributions are allowed to be arbitrary. 
Included are Section 8.6.3 dealing with an optimization problem concerning a 
single server, general service time queue, and Section 8.8, concerned with a single 
server, general service time queue in which the arrival source is a finite number of 
potential users. 

Chapter 9 is concerned with reliability theory. This chapter will probably be of 
greatest interest to the engineer and operations researcher. Section 9.6.1 illustrates a 
method for determining an upper bound for the expected life of a parallel system of 
not necessarily independent components and Section 9.7.1 analyzes a series structure 
reliability model in which components enter a state of suspended animation when one 
of their cohorts fails. 

Chapter 10 is concerned with Brownian motion and its applications. The theory 
of options pricing is discussed. Also, the arbitrage theorem is presented and its rela¬ 
tionship to the duality theorem of linear programming is indicated. We show how the 
arbitrage theorem leads to the Black-Scholes option pricing formula. 

Chapter 11 deals with simulation, a powerful tool for analyzing stochastic mod¬ 
els that are analytically intractable. Methods for generating the values of arbitrarily 
distributed random variables are discussed, as are variance reduction methods for 
increasing the efficiency of the simulation. Section 11.6.4 introduces the valuable 
simulation technique of importance sampling, and indicates the usefulness of tilted 
distributions when applying this method. 
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Introduction to 
Probability Theory 



1.1 Introduction 

Any realistic model of a real-world phenomenon must take into account the possibility 
of randomness. That is, more often than not, the quantities we are interested in will not 
be predictable in advance but, rather, will exhibit an inherent variation that should be 
taken into account by the model. This is usually accomplished by allowing the model to 
be probabilistic in nature. Such a model is, naturally enough, referred to as a probability 
model. 

The majority of the chapters of this book will be concerned with different probability 
models of natural phenomena. Clearly, in order to master both the “model building” 
and the subsequent analysis of these models, we must have a certain knowledge of basic 
probability theory. The remainder of this chapter, as well as the next two chapters, will 
be concerned with a study of this subject. 


1.2 Sample Space and Events 

Suppose that we are about to perform an experiment whose outcome is not predictable in 
advance. However, while the outcome of the experiment will not be known in advance, 
let us suppose that the set of all possible outcomes is known. This set of all possible 
outcomes of an experiment is known as the sample space of the experiment and is 
denoted by S. 
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Some examples are the following. 

1. If the experiment consists of the flipping of a coin, then 

5 = {H, T} 

where H means that the outcome of the toss is a head and T that it is a tail. 

2. If the experiment consists of rolling a die, then the sample space is 

S = {1,2,3,4,5,6} 

where the outcome i means that i appeared on the die, i = 1,2, 3, 4, 5, 6. 

3. If the experiments consists of flipping two coins, then the sample space consists of 
the following four points: 

S = {(//, H), (H, T ), (T, H ), (T, T)} 


The outcome will be (H, H ) if both coins come up heads; it will be ( H, T ) if the 
first coin comes up heads and the second comes up tails; it will be ( T , H ) if the first 
comes up tails and the second heads; and it will be ( T , T) if both coins come up 
tails. 

4. If the experiment consists of rolling two dice, then the sample space consists of the 
following 36 points: 


(1.1) , (1,2), (1,3), (1,4), (1,5), (1,6) 

(2.1) , (2, 2), (2, 3), (2, 4), (2, 5), (2, 6) 

(3.1) , (3, 2), (3, 3), (3, 4), (3, 5), (3, 6) 

(4.1) , (4, 2), (4, 3), (4, 4), (4, 5), (4, 6) 

(5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6) 

(6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6) 


where the outcome (/', j) is said to occur if i appears on the first die and j on the 
second die. 

5. If the experiment consists of measuring the lifetime of a car, then the sample space 
consists of all nonnegative real numbers. That is,* 


S = [0, oo) 


Any subset E of the sample space S is known as an event. Some examples of events 
are the following. 

['. In Example (1), if E = {//}, then E is the event that a head appears on the flip of 
the coin. Similarly, if E = {T}, then E would be the event that a tail appears. 

2'. In Example (2), if E = {1}, then E is the event that one appears on the roll of the 
die. If E — {2, 4, 6), then E would be the event that an even number appears on 
the roll. 

* The set (a, b ) is defined to consist of all points x such that a < x < b. The set [a, b] is defined to consist 
of all points x such that a ^, x ' h. The sets (a. b\ and [a, b) are defined, respectively, to consist of all 
points x such that a < x ^ b and all points x such that a x < h. 
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3'. In Example (3), if E = {(77, H), (H, T)}, then E is the event that a head appears 
on the first coin. 

4'. In Example (4), if E = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)}, then E is the 
event that the sum of the dice equals seven. 

5'. In Example (5), if E = (2, 6), then E is the event that the car lasts between two 
and six years. ■ 

We say that the event E occurs when the outcome of the experiment lies in E. For 
any two events E and F of a sample space S we define the new event E U F to consist 
of all outcomes that are either in E or in F or in both E and F. That is, the event F U F 
will occur if either E or F occurs. For example, in (1) if E = {//} and F = {7’), then 

E U F = {77, T} 

That is, E U F would be the whole sample space S. In (2) if E = {1, 3, 5} and 
F = {1, 2, 3}, then 


£UF = {1,2, 3,5) 


and thus E U F would occur if the outcome of the die is 1 or 2 or 3 or 5. The event 
E U F is often referred to as the union of the event E and the event F. 

For any two events E and F, we may also define the new event EF, sometimes 
written E (T F, and referred to as the intersection of E and F, as follows. EF consists 
of all outcomes which are both in E and in F. That is, the event EF will occur only if 
both E and F occur. For example, in (2) if £ = {1, 3, 5} and F — {1, 2, 3}, then 


EF — {1,3} 


and thus EF would occur if the outcome of the die is either 1 or 3. In Example (1) if 
E = {77} and F = { T }, then the event EF would not consist of any outcomes and 
hence could not occur. To give such an event a name, we shall refer to it as the null 
event and denote it by 0. (That is, 0 refers to the event consisting of no outcomes.) If 
EF = 0, then E and F are said to be mutually exclusive. 

We also define unions and intersections of more than two events in a similar manner. 
If 7?i, Ej, ■ ■ . are events, then the union of these events, denoted by U/^Li E n ,is defined 
to be the event that consists of all outcomes that are in E„ for at least one value 
of n = 1,2,.... Similarly, the intersection of the events E n , denoted by E n , 

is defined to be the event consisting of those outcomes that are in all of the events 
E n , n — 1,2,.... 

Finally, for any event E we define the new event E c , referred to as the complement of 
E , to consist of all outcomes in the sample space S that are not in E. That is ,E C will occur 
if and only if E does not occur. In Example (4) if F = {(1,6), (2, 5), (3,4), (4, 3), 
(5, 2), (6, 1)}, then E c will occur if the sum of the dice does not equal seven. Also note 
that since the experiment must result in some outcome, it follows that S c — 0. 
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1.3 Probabilities Defined on Events 

Consider an experiment whose sample space is S. For each event E of the sample 
space S, we assume that a number P(E) is defined and satisfies the following three 
conditions: 

(i) 0 < P(E) < 1. 

GO P(S)= 1. 

(iii) For any sequence of events E\, Ei,. ■ ■ that are mutually exclusive, that is, events 
for which E n E m = 0 when n ^ m , then 

( 00 \ CO 

U E n ) = E P(E '»> 

n= 1 / n =1 

We refer to P(E) as the probability of the event E. 

Example 1.1 In the coin tossing example, if we assume that a head is equally likely 
to appear as a tail, then we would have 

P({H}) = P{{T})=\ 

On the other hand, if we had a biased coin and felt that a head was twice as likely to 
appear as a tail, then we would have 

P({H}) = 1 F«r}) = i ■ 

Example 1.2 In the die tossing example, if we supposed that all six numbers were 
equally likely to appear, then we would have 

P({1}) = P({2}) = P({3}) = P({4}) = P({5}) = P({6}) = l 

From (iii) it would follow that the probability of getting an even number would equal 

P({ 2, 4, 6}) = P({2}) + P({4}> + P({6}) 

= 2 ■ 

Remark We have chosen to give a rather formal definition of probabilities as being 
functions defined on the events of a sample space. However, it turns out that these 
probabilities have a nice intuitive property. Namely, if our experiment is repeated over 
and over again then (with probability 1) the proportion of time that event E occurs will 
just be P(E). 

Since the events E and E c are always mutually exclusive and since E U E c = .S' we 
have by (ii) and (iii) that 

1 = P(S) = P(E U E c ) = P(E) + P(E C ) 


or 


P(E C ) = 1 - P(E) 


( 1 . 1 ) 
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In words, Equation (1.1) states that the probability that an event does not occur is 
one minus the probability that it does occur. 

We shall now derive a formula for P(EL) F), the probability of all outcomes either 
in E or in F. To do so, consider P(E) + P(F), which is the probability of all outcomes 
in E plus the probability of all points in F. Since any outcome that is in both E and F 
will be counted twice in P(E) + P(F) and only once in P(E U F), we must have 

P(E) + P(F ) = P(E U F) + P(EF) 

or equivalently 

P(E Uf) = P(E) + P(F) — P(EF) (1.2) 

Note that when E and F are mutually exclusive (that is, when EF = 0), then 
Equation (1.2) states that 

P(E U F) = P(E) + P(F) - P(0) 

= P(E) + P(F) 

a result which also follows from condition (iii). (Why is P(0) = 0?) 

Example 1.3 Suppose that we toss two coins, and suppose that we assume that each 
of the four outcomes in the sample space 

S = {(//, H ), (H, T ), (T, H), (7) T)} 

is equally likely and hence has probability |. Let 

E = {(H,H),(H,T)} and F = {(//, H), (T, H)} 

That is, E is the event that the first coin falls heads, and F is the event that the second 
coin falls heads. 

By Equation (1.2) we have that P(E U F), the probability that either the first or the 
second coin falls heads, is given by 

P(E U F) = P(E ) + P(F) - P(EF) 

= 3 + 2 - P({H, H}) 



This probability could, of course, have been computed directly since 

P(E U F) = P({H, H), (H, T), (T, H)}) = \ U 

We may also calculate the probability that any one of the three events E or F or G 
occurs. This is done as follows: 


P(E UfUG) = P((E Uf)UG) 
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which by Equation (1.2) equals 

P(E U F) + P(G) - P((E U F)G) 

Now we leave it for you to show that the events (EG F)G and EGGFG are equivalent, 
and hence the preceding equals 

P(EG FGG) 

= P(E) + P(F) - P(EF) + P(G) - P (EG U FG) 

= P(E) + P(F ) - P(EF) + P(G)~ P(EG) - P(FG) + P(EGFG) 

= P(E) + P(F) + P(G) - P(EF) - P(EG) - P(FG) + P(EFG) (1.3) 

In fact, it can be shown by induction that, for any n events E\, £ 2 , £ 3 , ..., E n , 

P(E\ U E 2 U • • • U E n ) = J2 p ( E ‘) - P ( E ‘ E j) + p ( E ‘ E j E k) 

i i<j i<j<k 

- J2 P ( E i E j E k E l) 

i < j<k<l 

H-+ ( — l) n+l P(E\E 2 ■ ■ ■ E n ) (1.4) 

In words, Equation (1.4), known as the inclusion-exclusion identity, states that the 
probability of the union of n events equals the sum of the probabilities of these events 
taken one at a time minus the sum of the probabilities of these events taken two at a 
time plus the sum of the probabilities of these events taken three at a time, and so on. 


1.4 Conditional Probabilities 

Suppose that we toss two dice and that each of the 36 possible outcomes is equally 
likely to occur and hence has probability ^g. Suppose that we observe that the first 
die is a four. Then, given this information, what is the probability that the sum of the 
two dice equals six? To calculate this probability we reason as follows: Given that 
the initial die is a four, it follows that there can be at most six possible outcomes of 
our experiment, namely, (4, 1), (4, 2), (4, 3), (4, 4), (4, 5), and (4, 6 ). Since each of 
these outcomes originally had the same probability of occurring, they should still have 
equal probabilities. That is, given that the first die is a four, then the (conditional) 
probability of each of the outcomes (4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6 ) is g while 
the (conditional) probability of the other 30 points in the sample space is 0. Hence, the 
desired probability will be g. 

If we let E and F denote, respectively, the event that the sum of the dice is six 
and the event that the first die is a four, then the probability just obtained is called the 
conditional probability that E occurs given that F has occurred and is denoted by 


P(E\F) 
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A general formula for P(E\F) that is valid for all events E and F is derived in the 
same manner as the preceding. Namely, if the event F occurs, then in order for E to 
occur it is necessary for the actual occurrence to be a point in both E and in F, that is, it 
must be in EF. Now, because we know that F has occurred, it follows that F becomes 
our new sample space and hence the probability that the event EF occurs will equal the 
probability of EF relative to the probability of F. That is, 

I’(EF) 

P(E\F ) = ——- (1.5) 

I p(/T) 


Note that Equation (1.5) is only well defined when P(F) > 0 and hence P(E\F) is 
only defined when P(F) > 0. 

Example 1.4 Suppose cards numbered one through ten are placed in a hat, mixed up, 
and then one of the cards is drawn. If we are told that the number on the drawn card is 
at least five, then what is the conditional probability that it is ten? 

Solution: Let E denote the event that the number of the drawn card is ten, and let 
F be the event that it is at least five. The desired probability is P(E\F). Now, from 
Equation (1.5) 


P(E\F) = 


P(EF) 
P(F ) 


However, EF — E since the number of the card will be both ten and at least five if 
and only if it is number ten. Hence, 


P(E\F) = 


X 

IT 

6 _ 

10 


1 

6 


Example 1.5 A family has two children. What is the conditional probability that both 
are boys given that at least one of them is a boy? Assume that the sample space S is 
given by S = {(b,b), ( b,g ), ( g,b ), (g, g)j, and all outcomes are equally likely, ((b, g) 
means, for instance, that the older child is a boy and the younger child a girl.) 

Solution: Letting B denote the event that both children are boys, and A the event 
that at least one of them is a boy, then the desired probability is given by 


P(B\A) = 


P(BA) 
P(A ) 


P({(b,b)}) _ + 

P({(b,b),(b,g),(g,b)}) \ 


1 

3 


Example 1.6 Bev can either take a course in computers or in chemistry. If Bev takes 
the computer course, then she will receive an A grade with probability ^; if she takes 
the chemistry course then she will receive an A grade with probability j. Bev decides 
to base her decision on the flip of a fair coin. What is the probability that Bev will get 
an A in chemistry? 
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Solution: If we let C be the event that Bev takes chemistry and A denote the event 
that she receives an A in whatever course she takes, then the desired probability is 
P(AC). This is calculated by using Equation (1.5) as follows: 

P(AQ = P(C)P(A\C) 

- II - I ■ 

~ 23 ~ 6 

Example 1.7 Suppose an urn contains seven black balls and five white balls. We 
draw two balls from the urn without replacement. Assuming that each ball in the urn 
is equally likely to be drawn, what is the probability that both drawn balls are black? 

Solution: Let F and E denote, respectively, the events that the first and second 
balls drawn are black. Now, given that the first ball selected is black, there are six 
remaining blackballs and five white balls, and so P(E\F) = yy ■ As P(F) is clearly 
Y 2 ? our desired probability is 

P(EF) = P(F)P(E\F) 

— XJL — _ 42 _ m 

— 121 ] — 132 

Example 1.8 Suppose that each of three men at a party throws his hat into the center 
of the room. The hats are first mixed up and then each man randomly selects a hat. 
What is the probability that none of the three men selects his own hat? 

Solution: We shall solve this by first calculating the complementary probability 
that at least one man selects his own hat. Let us denote by E ,■. i = 1,2,3, the event 
that the ;th man selects his own hat. To calculate the probability P(E\ U E 2 U £ 3 ), 
we first note that 

P{Ei)=\, 1 = 1,2,3 

P(EiEj) = 1 , i ± j ( 1 . 6 ) 

P(EiE 2 Ei) = g 

To see why Equation (1.6) is correct, consider first 
P(EiEj)= P(Ei)P(Ej\Ei) 

Now P(Ej), the probability that the ith man selects his own hat, is clearly i since 

he is equally likely to select any of the three hats. On the other hand, given that the 

ith man has selected his own hat, then there remain two hats that the jth man may 
select, and as one of these two is his own hat, it follows that with probability ^ he 
will select it. That is, P(Ej\E,) = ^ and so 

P(EiEj) = P(Ei)P(Ej\Ei) = = \ 

To calculate PiEiEjE-},) we write 

P(E l E 2 E 3 ) = P (E1E 2 ) P (E 2 \E [ E 2 ) 

= \PiEi\ExE2) 
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However, given that the first two men get their own hats it follows that the third man 
must also get his own hat (since there are no other hats left). That is, P(E 3 \E\E 2 ) = 1 
and so 


P(E l E 2 E 3 ) = ± 

Now, from Equation (1.4) we have that 

P(E i U E 2 U E 3 ) = P(Ei ) + P(E 2 ) + P(E 3 ) - P(EiE 2 ) 

— P{E\E 3 ) - P(E 2 E 3 ) + P(E l E 2 E 3 ) 

= 1 - 1+1 
1 2 ' 6 

_ 2 

— 3 

Hence, the probability that none of the men selects his own hat is 1 — | j. ■ 


1.5 Independent Events 

Two events E and F are said to be independent if 
P(EF) = P(E)P(F ) 

By Equation (1.5) this implies that E and F are independent if 
P(E\F) = P(E) 

(which also implies that P(F\E) = P(F)). That is, E and F are independent if 
knowledge that F has occurred does not affect the probability that E occurs. That is, 
the occurrence of E is independent of whether or not F occurs. 

Two events E and F that are not independent are said to be dependent. 

Example 1.9 Suppose we toss two fair dice. Let E\ denote the event that the sum of 
the dice is six and F denote the event that the first die equals four. Then 

P{E,F) = P{{A,2}) = ± 


while 

P(E 1 )P(F)=5 § l = - 2 fg 

and hence E\ and F are not independent. Intuitively, the reason for this is clear for 
if we are interested in the possibility of throwing a six (with two dice), then we will 
be quite happy if the first die lands four (or any of the numbers 1, 2, 3, 4, 5) because 
then we still have a possibility of getting a total of six. On the other hand, if the first 
die landed six, then we would be unhappy as we would no longer have a chance of 
getting a total of six. In other words, our chance of getting a total of six depends on the 
outcome of the first die and hence E \ and F cannot be independent. 
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Let E 2 be the event that the sum of the dice equals seven. Is E 2 independent of FI 
The answer is yes since 

P(E 2 F) = P({(4, 3)}) = ^ 


while 

P(E 2 )P(F ) = II = £ 

We leave it for you to present the intuitive argument why the event that the sum of 
the dice equals seven is independent of the outcome on the first die. ■ 

The definition of independence can be extended to more than two events. The events 
E\, E 2 ,..., E n are said to be independent if for every subset E\', E 2 >, ..., E r ' ■ r ^ n, 
of these events 


P{E V E 2 . ■ ■ ■ E r ,) = P(E V )P(E 2 >) ■ ■ ■ P(E r i) 


Intuitively, the events E\, E 2 , ..., E n are independent if knowledge of the occurrence 
of any of these events has no effect on the probability of any other event. 

Example 1.10 (Pairwise Independent Events That Are Not Independent) Let 

a ball be drawn from an urn containing four balls, numbered 1, 2, 3, 4. Let E = 
{1, 2}, F = {1, 3}, G = {1,4}. If all four outcomes are assumed equally likely, then 

P(EF) = P(E)P(F) = 

P(EG ) = P(E)P(G) = l 
P(FG) = P(F)P(G) = 1 

However, 

i = P(EFG) P(E)P(F)P(G) 

Hence, even though the events E, F, G are pairwise independent, they are not jointly 
independent. ■ 

Example 1.11 There are r players, with player i initially having n, units, n,- >0, i = 
1,..., r. At each stage, two of the players are chosen to play a game, with the winner 
of the game receiving 1 unit from the loser. Any player whose fortune drops to 0 is 
eliminated, and this continues until a single player has all n = Y^i=i n ‘ units, with 
that player designated as the victor. Assuming that the results of successive games are 
independent, and that each game is equally likely to be won by either of its two players, 
find the probability that player i is the victor. 

Solution: To begin, suppose that there are n players, with each player initially 
having 1 unit. Consider player i. Each stage she plays will be equally likely to 
result in her either winning or losing 1 unit, with the results from each stage being 
independent. In addition, she will continue to play stages until her fortune becomes 
either 0 or n. Because this is the same for all players, it follows that each player has the 
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same chance of being the victor. Consequently, each player has player probability 
1 /n of being the victor. Now, suppose these n players are divided into r teams, 
with team i containing n, players, i — 1 ,... ,r. That is, suppose players l,... ,n\ 
constitute team 1, players n\ + \,... ,n\+n .2 constitute team 2 and so on. Then the 
probability that the victor is a member of team i is «; /n. But because team i initially 
has a total fortune of «, units, i = 1 ,... ,r, and each game played by members 
of different teams results in the fortune of the winner’s team increasing by 1 and 
that of the loser’s team decreasing by 1, it is easy to see that the probability that 
the victor is from team i is exactly the desired probability. Moreover, our argument 
also shows that the result is true no matter how the choices of the players in each 
stage are made. ■ 

Suppose that a sequence of experiments, each of which results in either a “success” 
or a “failure,” is to be performed. Let £), z ^ 1, denote the event that the ith experiment 
results in a success. If, for all i i, h, ..., i n , 

n 

P(.E il E i 2 ---E in ) = Y[P(E ij ) 

7=1 

we say that the sequence of experiments consists of independent trials. 


1.6 Bayes' Formula 

Let E and F be events. We may express E as 
E = EFUEF 0 

because in order for a point to be in E, it must either be in both E and F, or it must be 
in E and not in F. Since EF and EF' are mutually exclusive, we have that 

P(E) = P(EF ) + P(£F C ) 

= P(E\F)P(F) + P(E\F C )P(F C ) 

= P(E\F)P(F) + P{E\F c )(l - P(F)) (1.7) 

Equation (1.7) states that the probability of the event E is a weighted average of the 
conditional probability of E given that F has occurred and the conditional probability 
of E given that F has not occurred, each conditional probability being given as much 
weight as the event on which it is conditioned has of occurring. 

Example 1.12 Consider two urns. The first contains two white and seven black balls, 
and the second contains five white and six black balls. We flip a fair coin and then 
draw a ball from the first urn or the second urn depending on whether the outcome 
was heads or tails. What is the conditional probability that the outcome of the toss was 
heads given that a white ball was selected? 
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Solution: Let W be the event that a white ball is drawn, and let H be the event 
that the coin comes up heads. The desired probability P(H\W) may be calculated 
as follows: 


P(H\W) = 


P(HW) _ P(W\H)P(H) 

IW - P(W) 

P(W\H)P(H ) 

P(W\H)P(H) + P(W\H C )P(H C ) 


Example 1.13 In answering a question on a multiple-choice test a student either knows 
the answer or guesses. Let p be the probability that she knows the answer and 1 — p the 
probability that she guesses. Assume that a student who guesses at the answer will be 
correct with probability 1/m, where m is the number of multiple-choice alternatives. 
What is the conditional probability that a student knew the answer to a question given 
that she answered it correctly? 


Solution: Let C and K denote respectively the event that the student answers the 
question correctly and the event that she actually knows the answer. 

Now 

P(KC ) P(C\K)P(K) 

P(C) ~~ P(C\K)P(K) + P(C\K C )P(K C ) 

P 


P(K\C) = 


(l/m)(l 

mp 


P) 


1 + (m — 1 )p 

Thus, for example, if in = 5, p = then the probability that a student knew the 
answer to a question she correctly answered is |. ■ 

Example 1.14 A laboratory blood test is 95 percent effective in detecting a certain 
disease when it is, in fact, present. However, the test also yields a “false positive” result 
for 1 percent of the healthy persons tested. (That is, if a healthy person is tested, then, 
with probability 0.01, the test result will imply he has the disease.) If 0.5 percent of 
the population actually has the disease, what is the probability a person has the disease 
given that his test result is positive? 

Solution: Let D be the event that the tested person has the disease, and E the event 
that his test result is positive. The desired probability P(D\E) is obtained by 

P(DE) P(E\D)P(D) 


P(D\E) = 


P(E) P(E\D)P(D) 
(0.95X0.005) 


P(E\D C )P(D C ) 


(0.95X0.005) 
95 
294 


(0.01X0.995) 


0.323 
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Thus, only 32 percent of those persons whose test results are positive actually have 
the disease. ■ 


Equation (1.7) may be generalized in the following manner. Suppose that 
F\, F 2 ,, F n are mutually exclusive events such that Fi = S. In other words, 
exactly one of the events F\, F 2 ,..., F n will occur. By writing 

n 

E=\jEFi 
i— I 

and using the fact that the events EFj, i — 1, ..., n, are mutually exclusive, we obtain 
that 

n 

P(E) = J2 p ( EF i) 
i =1 
n 

= Y J p (F\ F O p (Fi) ( 1 - 8 ) 

i=l 


Thus, Equation (1.8) shows how, for given events F\, F2,, F n of which one and 
only one must occur, we can compute P(E ) by first “conditioning” upon which one of 
the F, occurs. That is, it states that P(E) is equal to a weighted average of P(E \ F ,), 
each term being weighted by the probability of the event on which it is conditioned. 

Suppose now that E has occurred and we are interested in determining which one 
of the Fj also occurred. By Equation (1.8) we have that 


P(Fj\E) = 


P(FFj) 

P(E) 


P(E\Fj)P{Fj) 
Y!U P(E\F 1 )P(F 1 ) 


(1.9) 


Equation (1.9) is known as Bayes’ formula. 

Example 1.15 You know that a certain letter is equally likely to be in any one of three 
different folders. Let a, be the probability that you will find your letter upon making a 
quick examination of folder i if the letter is, in fact, in folder i, i = 1, 2, 3. (We may 
have a, < 1.) Suppose you look in folder 1 and do not find the letter. What is the 
probability that the letter is in folder 1 ? 

Solution: Let F,,i = 1, 2, 3 be the event that the letter is in folder ;; and let E 
be the event that a search of folder 1 does not come up with the letter. We desire 
P(F\ | E). From Bayes’ formula we obtain 

P(E\Fi)P{F { ) 

P{F\\E) = 3 

J 2 P(E\Fi)P(Fi) 

1=1 

_ (1 - «t)j _ 1 - ai 

(1 — a t)y + 3 + 5 3 — ai 
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Exercises 

1. A box contains three marbles: one red, one green, and one blue. Consider an 
experiment that consists of taking one marble from the box then replacing it in 
the box and drawing a second marble from the box. What is the sample space? 
If, at all times, each marble in the box is equally likely to be selected, what is the 
probability of each point in the sample space? 

*2. Repeat Exercise 1 when the second marble is drawn without replacing the first 
marble. 

3. A coin is to be tossed until a head appears twice in a row. What is the sample 
space for this experiment? If the coin is fair, what is the probability that it will be 
tossed exactly four times? 

4. Let E, F, G be three events. Find expressions for the events that of E, F. G 

(a) only F occurs, 

(b) both E and F but not G occur, 

(c) at least one event occurs, 

(d) at least two events occur, 

(e) all three events occur, 

(f) none occurs, 

(g) at most one occurs, 

(h) at most two occur. 

*5. An individual uses the following gambling system at Las Vegas. He bets $1 that 
the roulette wheel will come up red. If he wins, he quits. If he loses then he makes 
the same bet a second time only this time he bets $2; and then regardless of the 
outcome, quits. Assuming that he has a probability of 4 of winning each bet, what 
is the probability that he goes home a winner? Why is this system not used by 
everyone? 

6 . Show that E(F U G) = EF U EG. 

7. Show that (E U F) c = E C F C . 

8 . If P(E) — 0.9 and P(F) = 0.8, show that P(EF) ^ 0.7. In general, show that 

P(EF) > P(E) + P(F ) - 1 

This is known as Bonferroni’s inequality. 

*9. We say that E C F if every point in E is also in F. Show that if E C F, then 

P(F) = P(E) + P(FE C ) > P(E) 

10. Show that 



This is known as Boole’s inequality. 
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Hint: Either use Equation (1.2) and mathematical induction, or else show that 
U”=i E ‘ = U”=l F ‘’ where Fi = Ei, Fi = Ej pl'yl'i E c j, and use property (iii) 
of a probability. 

11. Iftwo fair dice are tossed, what is the probability that the sum is i, i = 2,3,, 12? 

12. Let E and F be mutually exclusive events in the sample space of an experiment. 
Suppose that the experiment is repeated until either event E or event F occurs. 
What does the sample space of this new super experiment look like? Show that 
the probability that event E occurs before event F is P(E)/ [ P(E ) + P(F)]. 

Hint: Argue that the probability that the original experiment is performed n 
times and E appears on the nth time is P(E) x (1 — /?)” , n = 1,2,..., where 

p = P(E) + P(F). Add these probabilities to get the desired answer. 

13. The dice game craps is played as follows. The player throws two dice, and if the 
sum is seven or eleven, then she wins. If the sum is two, three, or twelve, then 
she loses. If the sum is anything else, then she continues throwing until she either 
throws that number again (in which case she wins) or she throws a seven (in which 
case she loses). Calculate the probability that the player wins. 

14. The probability of winning on a single toss of the dice is p. A starts, and if he 
fails, he passes the dice to B, who then attempts to win on her toss. They continue 
tossing the dice back and forth until one of them wins. What are their respective 
probabilities of winning? 

15. Argue that E — EF U EF *, E U F — E U FEE. 

16. Use Exercise 15 to show that P(E U F) = P(E) + P{F) — P{EF). 

*17. Suppose each of three persons tosses a coin. If the outcome of one of the tosses 
differs from the other outcomes, then the game ends. If not, then the persons start 
over and retoss their coins. Assuming fair coins, what is the probability that the 
game will end with the first round of tosses? If all three coins are biased and have 
probability ^ of landing heads, what is the probability that the game will end at 
the first round? 

18. Assume that each child who is born is equally likely to be a boy or a girl. If a 
family has two children, what is the probability that both are girls given that (a) 
the eldest is a girl, (b) at least one is a girl? 

*19. Two dice are rolled. What is the probability that at least one is a six? If the two 
faces are different, what is the probability that at least one is a six? 

20. Three dice are thrown. What is the probability the same number appears on exactly 
two of the three dice? 

21. Suppose that 5 percent of men and 0.25 percent of women are color-blind. A 
color-blind person is chosen at random. What is the probability of this person 
being male? Assume that there are an equal number of males and females. 

22. A and B play until one has 2 more points than the other. Assuming that each point 
is independently won by A with probability p, what is the probability they will 
play a total of 2 n points? What is the probability that A will win? 

23. For events E\, E 2 , ..., E n show that 


P(E l E 1 ■ ■ ■ E n ) = P(E l )P(E 2 \Ei)P(E 3 \EiE 2 ) ■ ■ ■ P(E„\Ei ■ ■ ■ i) 
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24. In an election, candidate A receives n votes and candidate B receives m votes, 
where n > m. Assume that in the count of the votes all possible orderings of the 
n + m votes are equally likely. Let P nm denote the probability that from the first 
vote on A is always in the lead. Find 

(a)P 2 ,i (b)P 3 ,i (c) P n ,i (d) P 3 ,2 (e) Pa ,2 

(f) P„,2 (g) Pa ,3 (h) P 5 ,3 0) P 5 ,4 

(j) Make a conjecture as to the value of P nm . 

*25. Two cards are randomly selected from a deck of 52 playing cards. 

(a) What is the probability they constitute a pair (that is, that they are of the same 
denomination)? 

(b) What is the conditional probability they constitute a pair given that they are 
of different suits? 

26. A deck of 52 playing cards, containing all 4 aces, is randomly divided into 4 piles 
of 13 cards each. Define events E\, £ 2 , £ 3 , and £4 as follows: 

£1 = {the first pile has exactly 1 ace}, 

£2 = {the second pile has exactly 1 ace}, 

£3 = {the third pile has exactly 1 ace}, 

£4 = {the fourth pile has exactly 1 ace} 

Use Exercise 23 to find P(£i E2E3E4), the probability that each pile has an ace. 
*27. Suppose in Exercise 26 we had defined the events £;, i = 1,2, 3, 4, by 

£1 = {one of the piles contains the ace of spades}, 

£ 2 = {the ace of spades and the ace of hearts are in different piles}, 

£3 = {the ace of spades, the ace of hearts, 

and the ace of diamonds are in different piles}, 

£4 = {all 4 aces are in different piles} 

Now use Exercise 23 to find P(E\ £ 2£3 £ 4 ), the probability that each pile has an 
ace. Compare your answer with the one you obtained in Exercise 26. 

28. If the occurrence of B makes A more likely, does the occurrence of A make B 
more likely? 

29. Suppose that £(£) = 0.6. What can you say about £(£|£) when 

(a) £ and £ are mutually exclusive? 

(b) £ C £? 

(c) £ c £? 

*30. Bill and George go target shooting together. Both shoot at a target at the same time. 
Suppose Bill hits the target with probability 0.7, whereas George, independently, 
hits the target with probability 0.4. 
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(a) Given that exactly one shot hit the target, what is the probability that it was 
George’s shot? 

(b) Given that the target is hit, what is the probability that George hit it? 

31. What is the conditional probability that the first die is six given that the sum of 
the dice is seven? 

*32. Suppose all n men at a party throw their hats in the center of the room. Each man 
then randomly selects a hat. Show that the probability that none of the n men 
selects his own hat is 

111 ( - 1 )" 

— — — -j_ — —f- • • ■- 

2! 3! 4! n! 

Note that as n —> oo this converges to e~ l . Is this surprising? 

33. In a class there are four freshman boys, six freshman girls, and six sophomore 
boys. How many sophomore girls must be present if sex and class are to be 
independent when a student is selected at random? 

34. Mr. Jones has devised a gambling system for winning at roulette. When he bets, 
he bets on red, and places a bet only when the ten previous spins of the roulette 
have landed on a black number. He reasons that his chance of winning is quite 
large since the probability of eleven consecutive spins resulting in black is quite 
small. What do you think of this system? 

35. A fair coin is continually flipped. What is the probability that the first four flips are 

(a) H //. //, HI 

(b) T, H,H, HI 

(c) What is the probability that the pattern T, H, //. H occurs before the pattern 
H, H, H, HI 

36. Consider two boxes, one containing one black and one white marble, the other, 
two black and one white marble. A box is selected at random and a marble is 
drawn at random from the selected box. What is the probability that the marble 
is black? 

37. In Exercise 36, what is the probability that the first box was the one selected given 
that the marble is white? 

38. Urn 1 contains two white balls and one black ball, while urn 2 contains one white 
ball and five black balls. One ball is drawn at random from urn 1 and placed in urn 
2. A ball is then drawn from urn 2. It happens to be white. What is the probability 
that the transferred ball was white? 

39. Stores A, B, and C have 50, 75, and 100 employees, and, respectively, 50, 60, 
and 70 percent of these are women. Resignations are equally likely among all 
employees, regardless of sex. One employee resigns and this is a woman. What 
is the probability that she works in store C? 

*40. (a) A gambler has in his pocket a fair coin and a two-headed coin. He selects 
one of the coins at random, and when he flips it, it shows heads. What is the 
probability that it is the fair coin? 

(b) Suppose that he flips the same coin a second time and again it shows heads. 
Now what is the probability that it is the fair coin? 
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(c) Suppose that he flips the same coin a third time and it shows tails. Now what 
is the probability that it is the fair coin? 

41. In a certain species of rats, black dominates over brown. Suppose that a black rat 
with two black parents has a brown sibling. 

(a) What is the probability that this rat is a pure black rat (as opposed to being a 
hybrid with one black and one brown gene)? 

(b) Suppose that when the black rat is mated with a brown rat, all five of their 
offspring are black. Now, what is the probability that the rat is a pure black 
rat? 

42. There are three coins in a box. One is a two-headed coin, another is a fair coin, 
and the third is a biased coin that comes up heads 75 percent of the time. When 
one of the three coins is selected at random and flipped, it shows heads. What is 
the probability that it was the two-headed coin? 

43. The blue-eyed gene for eye color is recessive, meaning that both the eye genes of 
an individual must be blue for that individual to be blue eyed. Jo (F) and Joe (M) 
are both brown-eyed individuals whose mothers had blue eyes. Their daughter 
Flo, who has brown eyes, is expecting a child conceived with a blue-eyed man. 
What is the probability that this child will be blue eyed? 

44. Urn 1 has five white and seven black balls. Urn 2 has three white and twelve 
black balls. We flip a fair coin. If the outcome is heads, then a ball from urn 1 is 
selected, while if the outcome is tails, then a ball from urn 2 is selected. Suppose 
that a white ball is selected. What is the probability that the coin landed tails? 

*45. An urn contains b black balls and r red balls. One of the balls is drawn at random, 
but when it is put back in the urn c additional balls of the same color are put 
in with it. Now suppose that we draw another ball. Show that the probability 
that the first ball drawn was black given that the second ball drawn was red is 
b/(b + r + c). 

46. Three prisoners are informed by their jailer that one of them has been chosen at 
random to be executed, and the other two are to be freed. Prisoner A asks the jailer 
to tell him privately which of his fellow prisoners will be set free, claiming that 
there would be no harm in divulging this information, since he already knows that 
at least one will go free. The jailer refuses to answer this question, pointing out 
that if A knew which of his fellows were to be set free, then his own probability 
of being executed would rise from ^ to ^, since he would then be one of two 
prisoners. What do you think of the jailer’s reasoning? 

47. For a fixed event B, show that the collection P(A \ B), defined for all events A, 
satisfies the three conditions for a probability. Conclude from this that 

P(A\B) = P(A\BC)P(C\B ) + P(A\BC c )P(C c \B) 

Then directly verify the preceding equation. 

*48. Sixty percent of the families in a certain community own their own car, thirty 
percent own their own home, and twenty percent own both their own car and their 
own home. If a family is randomly chosen, what is the probability that this family 
owns a car or a house but not both? 
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Random Variables 



2.1 Random Variables 

It frequently occurs that in performing an experiment we are mainly interested in some 
functions of the outcome as opposed to the outcome itself. For instance, in tossing dice 
we are often interested in the sum of the two dice and are not really concerned about 
the actual outcome. That is, we may be interested in knowing that the sum is seven 
and not be concerned over whether the actual outcome was (1, 6 ) or (2, 5) or (3, 4) or 
(4, 3) or (5, 2) or ( 6 , 1). These quantities of interest, or more formally, these real-valued 
functions defined on the sample space, are known as random variables. 

Since the value of a random variable is determined by the outcome of the experiment, 
we may assign probabilities to the possible values of the random variable. 

Example 2.1 Letting X denote the random variable that is defined as the sum of two 
fair dice; then 

P{X = 2} = P{(1. 1)} = 3 ^, 

P{X = 3} = P{(1,2),(2,1)} = !, 

P{X = 4} = P{(1,3),(2,2),(3, 1)}= 

P{X = 5} = P{(1, 4), (2, 3), (3, 2), (4, 1)} = ^ 

P{X = 6 } = P{(1, 5), (2, 4), (3, 3), (4, 2), (5, 1)} = 

P{X = 7} = P{(1, 6 ), (2, 5), (3, 4), (4, 3), (5, 2), ( 6 , 1)} = 

P{X = 8} = P{(2, 6 ), (3, 5), (4, 4), (5, 3), ( 6 , 2)} = 

P{X = 9} = P{(3, 6 ), (4, 5), (5,4), ( 6 , 3)} = 
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P{X = 10} = P{(4, 6 ), (5, 5), ( 6 ,4)} = 

P{X = 11} = P{(5, 6), (6, 5)} = 

P{X = 12} = P{( 6 , 6 )} = ^ (2.1) 

In other words, the random variable X can take on any integral value between two and 
twelve, and the probability that it takes on each value is given by Equation (2.1). Since 
X must take on one of the values two through twelve, we must have 

1 12 1 12 

J{X = n} =^P{X = «} 

1=2 J h =2 

which may be checked from Equation (2.1). ■ 

Example 2.2 For a second example, suppose that our experiment consists of tossing 
two fair coins. Letting Y denote the number of heads appearing, then Y is a random 
variable taking on one of the values 0 , 1 , 2 with respective probabilities 

P{Y = 0 } = P{(T, T)}=1 
P{Y = l} = P{(T, H),(H, T)}= l 
P{Y = 2} = P{(H , H)j = \ 

Of course, P{Y = 0} + P{Y = 1} + P{Y = 2} = 1. ■ 

Example 2.3 Suppose that we toss a coin having a probability p of coming up heads, 
until the first head appears. Letting N denote the number of flips required, then assuming 
that the outcome of successive flips are independent, N is a random variable taking on 
one of the values 1, 2, 3, ..., with respective probabilities 

P{N = 1} = P{H] = p, 

P [N = 2} = P{(T, H)} = (l-p)p, 

P {N = 3] = P {( T, T, H)} = (l-p) 2 p, 


P{N = n} = P{(T, T,... , T, H)} = (1 - p)"~ l p , n > 1 

V '"'--' 

n — 1 

As a check, note that 

( 00 \ OO 

[J{A = n} ] = J2 p {N = n} 

n= 1 / n =1 

oo 

= pJ2 a-p)" -1 

n=i 

= p 
i - (i - p) 

= i ■ 
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Example 2.4 Suppose that our experiment consists of seeing how long a battery can 
operate before wearing down. Suppose also that we are not primarily interested in the 
actual lifetime of the battery but are concerned only about whether or not the battery 
lasts at least two years. In this case, we may define the random variable I by 


I = 


1 , 

0 , 


if the lifetime of battery is two or more years 
otherwise 


If E denotes the event that the battery lasts two or more years, then the random variable 
I is known as the indicator random variable for event E. (Note that I equals 1 or 0 
depending on whether or not E occurs.) ■ 

Example 2.5 Suppose that independent trials, each of which results in any of m pos¬ 
sible outcomes with respective probabilities pi, ..., p m , i Pi = 1. are continually 
performed. Let X denote the number of trials needed until each outcome has occurred 
at least once. 

Rather than directly considering P{X = n } we will first determine P{X > n], 
the probability that at least one of the outcomes has not yet occurred after n trials. 
Letting A, denote the event that outcome i has not yet occurred after the first n trials, 
i = 1 , ,m, then 


R{X > n} = R A; j 

m 

= E p ( A ')-EE p ( A ' A /) 

i =1 i<j 

+E E J2 P(AiA i Ak) -■■■+ (-l) m+l P(Ai ■ ■ ■ A m ) 

i<j<k 


Now, P(Ai ) is the probability that each of the first n trials results in a non-i outcome, 
and so by independence 

P(Ai) = (1 - Pi)' 1 

Similarly, P(A,Aj ) is the probability that the first?? trials all result in anon-? and non- j 
outcome, and so 

P(AiAj) = (1 - Pi - Pj ) n 

As all of the other probabilities are similar, we see that 

m 

P{X > n} = (1 - Pif - E E ^ 1 - P‘ ~ Pj )H 

i=1 i < j 

+ E E E (1 - Pi ~ Pj ~ Pk )n — 

i<j<k 
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Since P{X = n} = P{X > n — 1} — P{X >«}, we see, upon using the algebraic 
identity (1 — u )"" 1 — (1 — a) n = a( 1 — a)" -1 , that 

m 

P{X = n} = E Pii i - Pi) n 1 - E Eo* + - pi _ p/)" 1 

;=1 i<j 

+EEE(? i + pj + pk )(i - pi - pj - p*) n 1 — ■ 

i<j<k 

In all of the preceding examples, the random variables of interest took on either 
a finite or a countable number of possible values.* Such random variables are called 
discrete. However, there also exist random variables that take on a continuum of possible 
values. These are known as continuous random variables. One example is the random 
variable denoting the lifetime of a car, when the car’s lifetime is assumed to take on 
any value in some interval (a, b). 

The cumulative distribution function (cdf) (or more simply the distribution function) 
Ff) of the random variable X is defined for any real number b, —oo < b < oo, by 

F(b) = P{X < b} 

In words, F(b) denotes the probability that the random variable X takes on a value that 
is less than or equal to b. Some properties of the cdf F are 

(i) F(b) is a nondecreasing function of b, 

(ii) lim/^oo F(b) = F(o o) = 1, 

(iii) lim/^-oc F{b) = F(- oo) = 0. 

Property (i) follows since for a < b the event {X < a} is contained in the event 
{X < b}, and so it must have a smaller probability. Properties (ii) and (iii) follow since 
X must take on some finite value. 

All probability questions about X can be answered in terms of the cdf F(-). For 
example, 

P{a < X < b} = F(b) — F(a) for all a < b 

This follows since we may calculate P{a < X < b] by first computing the probability 
that X < b (that is, /■ f I ?)) and then subtracting from this the probability that X < a 
(that is, F(a)). 

If we desire the probability that X is strictly smaller than b, we may calculate this 
probability by 

P{X < b} = lim P{X < b — h) 
h->0+ 

= lim F(b — h) 
h- >0+ 

where lim/ ) _ > o+ means that we are taking the limit as h decreases to 0. Note that 
P{X < b } does not necessarily equal F(b) since F(b) also includes the probability 
that X equals b. 

* A set is countable if its elements can be put in a one-to-one correspondence with the sequence of positive 
integers. 
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2.2 Discrete Random Variables 

As was previously mentioned, a random variable that can take on at most a countable 
number of possible values is said to be discrete. For a discrete random variable X, we 
define the probability mass function p{a) of X by 

p(a) = P{X = a} 

The probability mass function p(a) is positive for at most a countable number of values 
of a. That is, if X must assume one of the values x\ , xi ...., then 

p(xi) > 0 , i = 1 , 2 ,... 
p(x) = 0 , all other values of x 

Since X must take on one of the values x ,, we have 

OO 

yy p(x,) = i 

i=i 

The cumulative distribution function F can be expressed in terms of p(ci) by 
F(a) = X] P(xi) 

all Xi<a 

For instance, suppose X has a probability mass function given by 

P( 1) = P( 2) = j, p( 3) = 5 

then, the cumulative distribution function F of X is given by 
0, a < 1 
1 < a 

F{a) = 2 

5 ’ 2 ^ a 
1, 3 < a 

This is graphically presented in Figure 2.1. 

F(x) 

1 “ 

5 - - 

6 
1 
2 


< 2 
< 3 


J_L 

1 2 


3 


x 


Figure 2.1 Graph of F(x). 
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Discrete random variables are often classified according to their probability mass 
functions. We now consider some of these random variables. 

2.2.1 The Bernoulli Random Variable 

Suppose that a trial, or an experiment, whose outcome can be classified as either a 
“success” or as a “failure” is performed. If we let X equal 1 if the outcome is a success 
and 0 if it is a failure, then the probability mass function of X is given by 


P (0) = P{X = 0} = 1 - p. 
Pi 1) = P{* = 1} = P 


where p,0 < p < 1, is the probability that the trial is a “success.” 

A random variable X is said to be a Bernoulli random variable if its probability mass 
function is given by Equation (2.2) for some p e (0, 1). 


2.2.2 The Binomial Random Variable 

Suppose that n independent trials, each of which results in a “success” with probability 
p and in a “failure” with probability 1 — p, are to be performed. If X represents the 
number of successes that occur in the n trials, then X is said to be a binomial random 
variable with parameters (n, p). 

The probability mass function of a binomial random variable having parameters 
(n, p) is given by 


p(i) 


(fjr'V-py 


i = 0. 1 


where 



n\ 

(n — i )!! 


(2.3) 


equals the number of different groups of i objects that can be chosen from a set of n 
objects. The validity of Equation (2.3) may be verified by first noting that the proba¬ 
bility of any particular sequence of the n outcomes containing i successes and n — i 
failures is, by the assumed independence of trials, p ‘{1 — p) n ~ l . Equation (2.3) then 
follows since there are ^" J different sequences of the n outcomes leading to i suc¬ 
cesses and n — i failures. For instance, if n = 3, i = 2, then there are ^ j =3 
ways in which the three trials can result in two successes. Namely, any one of the 
three outcomes (s, s, /), (s, /, s), (/, s, s ), where the outcome (s, s, /) means that 
the first two trials are successes and the third a failure. Since each of the three out¬ 
comes (s, s, /), (5, /, s ), (/, s, s) has aprobability p 2 { 1 — p) of occurring the desired 
probability is thus () p 2 ( \ — p). 
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Note that, by the binomial theorem, the probabilities sum to one, that is, 

oo n / \ 

p‘(J — p) ni = (p + (i — p)) n = i 

!=0 1=0 N ' 


E->« = e " 


Example 2.6 Four fair coins are flipped. If the outcomes are assumed independent, 
what is the probability that two heads and two tails are obtained? 

Solution: Letting X equal the number of heads (“successes”) that appear, then 
X is a binomial random variable with parameters (n = 4, p = j). Hence, by 
Equation (2.3), 


P[X = 2} 



Example 2.7 It is known that any item produced by a certain machine will be defective 
with probability 0.1, independently of any other item. What is the probability that in a 
sample of three items, at most one will be defective? 

Solution: If X is the number of defective items in the sample, then X is a binomial 
random variable with parameters (3, 0.1). Hence, the desired probability is given by 

P{X = 0} + P{X = 1} = (I j (0.1)°(0.9 ) 3 + K j (0.1 ) 1 (0.9 ) 2 = 0.972 ■ 


Example 2.8 Suppose that an airplane engine will fail, when in flight, with probability 
1 — p independently from engine to engine; suppose that the airplane will make a 
successful flight if at least 50 percent of its engines remain operative. For what values 
of p is a four-engine plane preferable to a two-engine plane? 

Solution: Because each engine is assumed to fail or function independently of what 
happens with the other engines, it follows that the number of engines remaining 
operative is a binomial random variable. Hence, the probability that a four-engine 
plane makes a successful flight is 

(T) F 2 d - P) 2 + (^) P 3 ( 1 - P) + (^j /(1 - p f 
= 6p 2 (l - P) 2 + 4p 3 (l - p) + p 4 
whereas the corresponding probability for a two-engine plane is 

(Y) P( 1 - P) + (Y) P 2 = 2p(l - P) + p 2 
Hence the four-engine plane is safer if 

6p 2 (l - p) 2 + 4p 3 (l - p) + / > 2p(l - p) + p 2 
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or equivalently if 

6p(l - P) 2 + V(1 - p) + p 3 >2 - p 

which simplifies to 

3p 3 — 8/7 2 + 7p — 2>0 or {p - \) 2 Qp - 2) > 0 
which is equivalent to 

3 p — 2 > 0 or /? > | 

Hence, the four-engine plane is safer when the engine success probability is at 
least as large as |, whereas the two-engine plane is safer when this probability falls 
below |. ■ 

Example 2.9 Suppose that a particular trait of a person (such as eye color or left 
handedness) is classified on the basis of one pair of genes and suppose that d represents 
a dominant gene and r a recessive gene. Thus a person with del genes is pure dominance, 
one with rr is pure recessive, and one with rd is hybrid. The pure dominance and the 
hybrid are alike in appearance. Children receive one gene from each parent. If, with 
respect to a particular trait, two hybrid parents have a total of four children, what is the 
probability that exactly three of the four children have the outward appearance of the 
dominant gene? 

Solution: If we assume that each child is equally likely to inherit either of two genes 
from each parent, the probabilities that the child of two hybrid parents will have 
dd, rr, or rd pairs of genes are, respectively, |Hence, because an offspring 
will have the outward appearance of the dominant gene if its gene pair is either 
dd or rd, it follows that the number of such children is binomially distributed with 
parameters (4, |). Thus the desired probability is 



Remark on Terminology If X is a binomial random variable with parameters (n , p), 
then we say that X has a binomial distribution with parameters («, p). 

2.2.3 The Geometric Random Variable 

Suppose that independent trials, each having probability p of being a success, are 
performed until a success occurs. If we let X be the number of trials required until the 
first success, then X is said to be a geometric random variable with parameter p. Its 
probability mass function is given by 

p{n) = P{X = n} = {\-p) n ~ l p, n= 1,2,... (2.4) 

Equation (2.4) follows since in order for X to equal n it is necessary and sufficient that 
the first n — 1 trials be failures and the nth trial a success. Equation (2.4) follows since 
the outcomes of the successive trials are assumed to be independent. 
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To check that p{n) is a probability mass function, we note that 

OO OO 

X! = p X! (1 ~ p)" _1 = 1 

«=1 «=1 


2 . 2.4 The Poisson Random Variable 

A random variable X, taking on one of the values 0, 1, 2,..., is said to be a Poisson 
random variable with parameter X, if for some X > 0, 

, X‘ 

p{i) = P{X = i] = e ~ x —, i=0,l,... (2.5) 

i\ 

Equation (2.5) defines a probability mass function since 


J2 P {l ) = e ^ Z) 77 = e X e x =\ 


i=0 


i =0 


The Poisson random variable has a wide range of applications in a diverse number of 
areas, as will be seen in Chapter 5. 

An important property of the Poisson random variable is that it may be used to 
approximate a binomial random variable when the binomial parameter n is large and 
p is small. To see this, suppose that A is a binomial random variable with parameters 
(n, p), and let X = np. Then 


m = l] = 0^W' (1 - 

=(-V (i - 

(n — ;)!;! \n J \ n ) 

_ n(n - 1) • • • (n - i + 1) X‘ (1 - X/n) n 
n' i\ (1 — X/n)' 


Now, for n large and p small 



-k 


n(n — 1 )•■■(« — 
n' 


( + 1) 


1 , 



1 


Hence, for n large and p small. 


P{X = i] 


-k 




Example 2.10 Suppose that the number of typographical errors on a single page of 
this book has a Poisson distribution with parameter X = 1. Calculate the probability 
that there is at least one error on this page. 
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Solution: 


P{X > 1} = 1 - P{X = 0} = 1 - e _1 ^ 0.633 ■ 

Example 2.11 If the number of accidents occurring on a highway each day is a Poisson 
random variable with parameter X = 3, what is the probability that no accidents occur 
today? 

Solution: 


P{X = 0} = <?“ 3 « 0.05 ■ 

Example 2.12 Consider an experiment that consists of counting the number of 
a-particles given off in a one-second interval by one gram of radioactive material. If we 
know from past experience that, on the average, 3.2 such a-particles are given off, what 
is a good approximation to the probability that no more than two a-particles will appear? 

Solution: If we think of the gram of radioactive material as consisting of a large 
number n of atoms each of which has probability 3.2 In of disintegrating and send¬ 
ing off an a-particle during the second considered, then we see that, to a very close 
approximation, the number of a-particles given off will be a Poisson random variable 
with parameter X = 3.2. Hence the desired probability is 

P{X < 2} = e~ 3 ' 2 + 3.2e~ 3 ' 2 + ( -^L e ~ 3 ' 2 « 0.382 ■ 


2.3 Continuous Random Variables 


In this section, we shall concern ourselves with random variables whose set of possible 
values is uncountable. Let X be such a random variable. We say that X is a continuous 
random variable if there exists a nonnegative function f(x), defined for all real x e 
(—oo, oo), having the property that for any set B of real numbers 

P{X e B}= f f(x) dx (2.6) 

Jb 


The function f (x ) is called the probability density function of the random variable X . 

In words. Equation (2.6) states that the probability that X will be in B may be 
obtained by integrating the probability density function over the set B. Since X must 
assume some value, f(x) must satisfy 


1 = P{X e (— oo, oo)} 



tlx 


All probability statements about X can be answered in terms of f(x). For instance, 
letting B = [a, b\, we obtain from Equation (2.6) that 

P{a < X < b] = f f(x)dx 


(2.7) 
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If we let a = b in the preceding, then 


P{X = a} = ( f(x)dx = 0 

J a 

In words, this equation states that the probability that a continuous random variable 
will assume any particular value is zero. 

The relationship between the cumulative distribution Ff) and the probability density 
/(•) is expressed by 


F(a) = P{X e (—oo, a]} = f f(x)dx 

J —OO 


Differentiating both sides of the preceding yields 

- 7 -F{a) = f (a) 
da 

That is, the density is the derivative of the cumulative distribution function. A some¬ 
what more intuitive interpretation of the density function may be obtained from 
Equation (2.7) as follows: 

P\a - - < X < a + - = / f(x)dx^ef(a) 

1 2 2 1 Ja-e/2 

when s is small. In other words, the probability that X will be contained in an interval 
of length e around the point a is approximately sf(a). From this, we see that f(a) is 
a measure of how likely it is that the random variable will be near a. 

There are several important continuous random variables that appear frequently in 
probability theory. The remainder of this section is devoted to a study of certain of 
these random variables. 


2.3.1 The Uniform Random Variable 

A random variable is said to be uniformly distributed over the interval (0, 1) if its 
probability density function is given by 

— 1’ 0 < x < I 

' ' 0, otherwise 


Note that the preceding is a density function since fix) > 0 and 



/ (x) dx = 



dx 


1 


Since /(x) > 0 only when x e (0, 1), it follows that X must assume a value in (0, 1). 
Also, since /(x) is constant forx e (0, 1), X is just as likely to be “near” any value in 
(0, 1) as any other value. To check this, note that, for any 0 < a < b < 1, 

P{a < X < b} = f f(x) dx = b — a 
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In other words, the probability that X is in any particular subinterval of (0, 1) equals 
the length of that subinterval. 

In general, we say that X is a uniform random variable on the interval (a, P) if its 
probability density function is given by 


1 


/(*) = 


P — a ’ 

0 , 


if a < x < P 
otherwise 


( 2 . 8 ) 


Example 2.13 Calculate the cumulative distribution function of a random variable 
uniformly distributed over (a, P). 

Solution: Since F(a ) = f_ f (x) dx, we obtain from Equation (2.8) that 


f 0, 


F{a) = 


a — a 
P — a’ 

1 , 


a < a 

a < a < P ■ 

a > P 


Example 2.14 If X is uniformly distributed over (0, 10), calculate the probability 
that (a) X < 3, (b) X > 7, (c) 1 < X < 6. 

Solution: 


P{X < 3} = 


P{X > 7} = 


P{1 < X < 6} = 


fo dx 


3 

To’ 

3 

io - To’ 


10 

f-j 0 dx 


Ii dx 

10 


2.3.2 Exponential Random Variables 


A continuous random variable whose probability density function is given, for some 
X > 0, by 


fix) = 


le~ Xx , 

0 , 


if x > 0 
if x < 0 


is said to be an exponential random variable with parameter X. These random variables 
will be extensively studied in Chapter 5, so we will content ourselves here with just 
calculating the cumulative distribution function F: 

f a 

F(ci) = / Xe~ Xx dx = 1 - e~ Xa , a > 0 

Jo 

Note that F(oo) = / 0 °° Xe~ Xx dx = 1, as, of course, it must. 
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2.3.3 Gamma Random Variables 


A continuous random variable whose density is given by 


fix) = 


Xe~ kx (Xx) 01 - 1 

0 , 


if x > 0 
if x < 0 


for some X > 0, a > 0 is said to be a gamma random variable with parameters a, X. 
The quantity T (a) is called the gamma function and is defined by 

/»00 

T(a) = / e~ x x a ~ l dx 

Jo 

It is easy to show by induction that for integral a, say, a — n. 


T(n) = {n- 1)! 


2.3.4 Normal Random Variables 


We say that A is a normal random variable (or simply that X is normally distributed) 
with parameters ji and cr 2 if the density of X is given by 


fix ) = 


1 c -U-n) 2 /2a 2 ' 

\f7jt a 


— OO < X < oo 


This density function is a bell-shaped curve that is symmetric around /i (see Figure 2.2). 

An important fact about normal random variables is that if X is normally distributed 
with parameters // and a 2 then Y = a X + /3 is normally distributed with parameters 
a/i + P and ora 1 . To prove this, suppose first that a > 0 and note that /• V (Of the 
cumulative distribution function of the random variable Y, is given by 


F Y {a) = P{Y < a) 

= P{aX + ;8 < a } 



Figure 2.2 Normal density function. 

* When there is more than one random variable under consideration, we shall denote the cumulative 
distribution function of a random variable Z by F z (-). Similarly, we shall denote the density of Z by f z (•). 
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= P 
— Fx 

-s. 

-L 


x < 


a — 


a — 


a 

(a-P)/a 


-Jin 


1 e-^-^/^dx 


1 


— OO V2n c 


exp 


-jv - {an + ft)) 2 

2 a 2 o 2 


dv 


(2.9) 


where the last equality is obtained by the change in variables v = ax + ft. However, 
since Fy{a ) = fy{v) dv, it follows from Equation (2.9) that the probability 
density function fy{-) is given by 


fy(v) = —=—exp 
y/2nao 


-{v- {an. + ft ))- 


2(acr)- 


— OO < V < OO 


Hence, Y is normally distributed with parameters an + ft and (aa) 2 . A similar result 
is also true when a < 0. 

One implication of the preceding result is that if X is normally distributed with 
parameters n a 'id a 2 then Y = (X — n)/ G is normally distributed with parameters 0 
and 1. Such a random variable Y is said to have the standard or unit normal distribution. 


2.4 Expectation of a Random Variable 

2.4.1 The Discrete Case 

If X is a discrete random variable having a probability mass function p(x), then the 
expected value of X is defined by 

£[2f] = X, xp M 

x:p(x)> 0 

In other words, the expected value of X is a weighted average of the possible values 
that X can take on, each value being weighted by the probability that X assumes that 
value. For example, if the probability mass function of X is given by 

Pi 1) = 2 = Pi 2) 

then 


E[X]= 1 (D + 2(1) = | 

is just an ordinary average of the two possible values 1 and 2 that X can assume. On 
the other hand, if 


Pi 1) = 


Pi2) = | 
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then 

E[X]= l(j) +2(|) = f 

is a weighted average of the two possible values 1 and 2 where the value 2 is given 
twice as much weight as the value 1 since p( 2 ) = 2 p(\). 

Example 2.15 Find E\X\ where X is the outcome when we roll a fair die. 

Solution: Since p( 1) = p( 2) = p( 3) = p(4) = p( 5) = p( 6 ) = g, we obtain 

£[X] = 1 (g) + 2 (g) + 3 (g) + 4 (g) + 5 (g) + 6 (g) = 2 ■ 

Example 2.16 (Expectation of a Bernoulli Random Variable) Calculate E[X] 
when X is a Bernoulli random variable with parameter p. 

Solution: Since p( 0) = 1 — p, p (1) = p ,we have 
E[X] = 0(1 - p) + l(p) = p 

Thus, the expected number of successes in a single trial is just the probability that 
the trial will be a success. ■ 


Example2.17 (Expectation ofa Binomial Random Variable) Calculate E[X\ when 
X is binomially distributed with parameters n and p. 

Solution: 


E[X] = ^2ip(i ) 

;=0 

= ib("Wc.- P >- 


!= 0 


i=l 


= Z^pX-pr-‘ 


= E 


1=1 


(n - m - 1)! 


Al-P)" 


=*p£ ( ( "; 1)! lv p‘-x- P r-‘ 

(« - OK* - l)! 


-■'If; 1 ) 

= np[p + (1 - p)] 

= np 


/(I - p) n -'- k 
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where the second from the last equality follows by letting k = i — 1. Thus, the 
expected number of successes in n independent trials is n multiplied by the proba¬ 
bility that a trial results in a success. ■ 

Example 2.18 (Expectation of a Geometric Random Variable) Calculate the expec¬ 
tation of a geometric random variable having parameter p. 

Solution: By Equation (2.4), we have 

OO 

E[X] = ^np(l-p)"- 1 

n= 1 
oo 

= P^M n ~ { 

«=] 


where q = 1 — p, 


^ <l( l 




"t (r~) 

dq \1 ~qj 

P 

(1 - q) 2 

l 

p 


In words, the expected number of independent trials we need to perform until we 
attain our first success equals the reciprocal of the probability that any one trial 
results in a success. ■ 

Example 2.19 (Expectation of a Poisson Random Variable) Calculate E[X] if X 
is a Poisson random variable with parameter k. 

Solution: From Equation (2.5), we have 


E[X] = X] 


ie~^k' 


i=0 


= E 


e^k 1 

O' - 1)! 


= ^E 


A. 1-1 


0 - 1 )! 
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oo 


= ^E 

k=0 

, —A A. 

= Xe e 


X k 

~k\ 


= X 


where we have used the identity ^ A- = ■ 

2.4.2 77te Continuous Case 

We may also define the expected value of a continuous random variable. This is done 
as follows. If X is a continuous random variable having a probability density function 
f(x), then the expected value of X is defined by 

/ OO 

xf(x) dx 

-OO 


Example 2.20 (Expectation of a Uniform Random Variable) Calculate the expec¬ 
tation of a random variable uniformly distributed over (a, 1 6). 

Solution: From Equation (2.8) we have 


E[X] = 



dx 



2(P - a) 
P + ol 
2 


In other words, the expected value of a random variable uniformly distributed over 
the interval (a, ft) is just the midpoint of the interval. ■ 

Example 2.21 (Expectation of an Exponential Random Variable) Let X be expo¬ 
nentially distributed with parameter X. Calculate £[V], 

Solution: 


E[X} = 



dx 


Integrating by parts (dv = Xe , u = x) yields 


E[X] = 


lo 


+ 


f 


e lx dx 


n —Xx 


= 0- 


1 

X 


o 
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Example 2.22 (Expectation of a Normal Random Variable) Calculate E[X] when 
X is normally distributed with parameters /i and a 2 . 

Solution: 


E[X] = 


= —L- f°° xe^-^ 2 ^ 2 dx 
\j2no J —oo 


Writing x as (x — /x) + /r yields 

1 poo 1 poo 

E[X] = / (jc — /2ff dx + p-—- / ' 2a dx 

\'2,7lO J—oo v2jT(J J —oo 

Letting y = x — fi leads to 

1 /*°° 2/ 2 r 00 

E[X] = r— / ye~ y/2cr ~dy + p f(x)dx 

V Z7T O' J—oo J—OO 

where /(x) is the normal density. By symmetry, the first integral must be 0, and so 

/ OO 

fix) dx = /X ■ 

-OO 


2.4.3 Expectation of a Function of a Random Variable 

Suppose now that we are given a random variable X and its probability distribution 
(that is, its probability mass function in the discrete case or its probability density 
function in the continuous case). Suppose also that we are interested in calculating not 
the expected value of X, but the expected value of some function of X, say, g(X). 
How do we go about doing this? One way is as follows. Since g(X) is itself a random 
variable, it must have a probability distribution, which should be computable from a 
knowledge of the distribution of X. Once we have obtained the distribution of g(X), 
we can then compute £[g(X)] by the definition of the expectation. 

Example 2.23 Suppose X has the following probability mass function: 

p(0) = 0.2, p( 1) = 0.5, p( 2) = 0.3 

Calculate E[X 2 ~\. 

Solution: Letting Y = X 2 , we have that Y is a random variable that can take on 
one of the values 0 2 , l 2 , 2 2 with respective probabilities 

p Y i 0) = P[Y = 0 2 } = 0.2, 

Pr(l) = P{Y = l 2 } = 0.5, 

Pr(4) = P{Y = 2 2 } = 0.3 


E[X 2 ] = E[Y] = 0(0.2) + 1(0.5) + 4(0.3) = 1.7 


Hence, 
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Note that 

1.7 = £[X 2 ] / (£[X]) 2 = 1.21 ■ 

Example 2.24 Let X be uniformly distributed over (0, 1). Calculate E[X 3 ~\. 

Solution: Letting Y = X 3 , we calculate the distribution of Y as follows. For 
0 < a < 1, 

Fy(a) = P{Y < a} 

= P{X 3 < «} 

= P{X < a 1/3 } 


where the last equality follows since X is uniformly distributed over (0, 1). By 
differentiating Fy(a), we obtain the density of Y, namely. 


fy(a) = 3 a 


= ln-V 3 


0 < a < 1 


Hence, 


/ OO 

afy(a ) da 

-OO 

= [ cijfl -2 / 3 da 

Jo 


if- 


J! 3 da 


= x - 3 -a A ' 3 
3 4 U 


While the foregoing procedure will, in theory, always enable us to compute the 
expectation of any function of X from a knowledge of the distribution of X, there is, 
fortunately, an easier way to do this. The following proposition shows how we can 
calculate the expectation of g(X) without first determining its distribution. 

Proposition 2.1 (a) If X is a discrete random variable with probability mass function 

p (x ), then for any real-valued function g. 


E[g(X)] = ^ g(x)p(x) 

x:p(x)> 0 


(b) If X is a continuous random variable with probability density function f(x), then 
for any real-valued function g, 


i 


= / g(x)f(x)dx 
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Example 2.25 Applying the proposition to Example 2.23 yields 
E[X 2 } = 0 2 (0.2) + (1 2 )(0.5) + (2 2 ) (0.3) = 1.7 
which, of course, checks with the result derived in Example 2.23. ■ 

Example 2.26 Applying the proposition to Example 2.24 yields 

E[X 2 ] = f x 3 dx (since f(x) = 1, 0 < x < 1) 

Jo 

= 1 ■ 

A simple corollary of Proposition 2.1 is the following. 

Corollary 2.2 If a and b are constants, then 
E[aX + b] = aE[X] + b 
Proof. In the discrete case. 


E[aX + b\ = (ax + b)p(x) 

x:p(x)>0 

= a Y] + b E p(x) 

x:p(x)> 0 x:p(x)>0 

= aE[X] + b 
In the continuous case. 


E[aX + b] = 


-i 


—OO 

oo 


(ax + b) f (x) dx 

/ oo 

f(x ) dx 

-oo 

= aE[X] + b 


f 


The expected value of a random variable X, E\X\, is also referred to as the mean or 
the first moment of X. The quantity £[X"], n > 1, is called the nth moment of X. By 
Proposition 2.1, we note that 


E[X n ] = 


Y, x n p(x), if A is discrete 

x:p(x)>0 


/ oo 

-oo 


f(x) dx, if A is continuous 


Another quantity of interest is the variance of a random variable A, denoted by 
Var(A), which is defined by 

Var(A) = e[(A - £[A]) 2 ] 

Thus, the variance of A measures the expected square of the deviation of A from its 
expected value. 
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Example 2.27 (Variance of the Normal Random Variable) Let X be normally 
distributed with parameters /x and a 2 . Find Var(X). 

Solution: Recalling (see Example 2.22) that E[X] = /x, we have that 

) 2 ] 

r»00 


Var(V) = E[(X - /x)'] 

=- f°° (x - /x) 2 e- (A '- /l)2/2cr2 dx 
2,7V (J J—o O 


V2jrcr J-c 
Substituting y = (x — /x)/ct yields 

,_2 poo 


Var(V) = 


/ oo 

y 2 e~y 2 r-dy 

-oo 


\/2tT J-c 

Integrating by parts (m = y, dv = ye~ }2 / 2 dy) gives 


Var(V) = 


— f 

-s/2tt V 
„2 


-ye-r ' 2 


+ 


f 


e -^ 2 / 2 rfy 


V2jr J- 


J 

J — C 


0 -r/2 


dy 


Another derivation of Var( X ) will be given in Example 2.42. 

Suppose that X is continuous with density /, and let E[X] = /x. Then, 

Var(X) = E[(X - /x) 2 ] 

= E[X 2 - 2/xX + /x 2 ] 

/ OO 

(x 2 — 2fix + ii 2 ) f(x) dx 

-( 


J 


—OO 

oo 


/ oo 

xf(x) dx + ix 2 

-oo 


f 


f(x)dx 


= E[X 2 ] — 2/x/x + /x 2 
= £[X 2 ] - /x 2 


A similar proof holds in the discrete case, and so we obtain the useful identity 
Var(X) = £[X 2 ] - (£[X]) 2 

Example 2.28 Calculate Var(2Q when X represents the outcome when a fair die is 
rolled. 

Solution: As previously noted in Example 2.15, E[X\ = Also, 

E[X 2 ) = 1 (g) + 2 2 (g) + 3 2 (I) + 4 2 (*) + 5 2 (I) + 6 2 (I) = (i) (91) 

Hence, 


Var(X) = 


91 


G ) 2 = 
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2.5 Jointly Distributed Random Variables 

2.5.1 Joint Distribution Functions 

Thus far, we have concerned ourselves with the probability distribution of a single 
random variable. However, we are often interested in probability statements concerning 
two or more random variables. To deal with such probabilities, we define, for any two 
random variables X and Y. the joint cumulative probability distribution function of X 
and Y by 

F(a, b ) = P{X < a,Y < b], —oo < a, b < oo 

The distribution of X can be obtained from the joint distribution of X and Y as follows: 

F x {a) = P{X < a] 

= P{X < a, Y < oo} 

= F(a, oo) 

Similarly, the cumulative distribution function of Y is given by 
Fyfb) = P{Y < b} = F(o o, b) 

In the case where X and Y are both discrete random variables, it is convenient to define 
the joint probability mass function of X and Y by 


p(x.y) = P{X = x,Y = y} 


The probability mass function of X may be obtained from p[x, y) by 



y-p(x,y)> 0 

Similarly, 

p(x, y) 



x:p(x,y)>0 

We say that X and Y are jointly continuous if there exists a function /(x, y), defined 
for all real x and y, having the property that for all sets A and B of real numbers 



The function /(x, y) is called the joint probability density function of X and Y. The 
probability density of X can be obtained from a knowledge of /(x, y) by the following 
reasoning: 


P{X eA} = P{X e A,Y e (-oo, oo)} 
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where 


fx(x) 


-f 


fix, y) dy 


is thus the probability density function of X. Similarly, the probability density function 
of Y is given by 


f 


fy(y)= I f(x,y)dx 
Because 

F(a,b) = P(X < a,Y 


*»-r / 

«/—oo J—o o 


f(x, y) dy dx 


differentiation yields 
d 2 


-F(a, b) = f(a , b) 


da db 

Thus, as in the single variable case, differentiating the probability distribution function 
gives the probability density function. 

A variation of Proposition 2.1 states that if X and Y are random variables and g is 
a function of two variables, then 


E[g(X, 7)] = EE g(x, y)p(x, y) 


in the discrete case 


/ OO p OO 

/ g(x. y)f(x, y) dx dy in the continuous case 

-oo J —OO 


For example, if g(X, Y) = X + Y, then, in the continuous case, 

/ OO poo 

/ (x + y)f(x, y)dx dy 

-oo J —OO 


/ oo poo poo poo 

/ xf(x,y)dxdy+ / / yf(x,y)dxdy 

-oo J —oo J — oo J —oo 


-oo J —oo 
= E[X] + E[Y] 


where the first integral is evaluated by using the variation of Proposition 2.1 with 
g(x, y) — x, and the second with g(x, y) = y. 

The same result holds in the discrete case and, combined with the corollary in Section 
2.4.3, yields that for any constants a, b 

E[aX + bY]=aE[X] + bE[Y] (2.10) 


Joint probability distributions may also be defined for n random variables. The details 
are exactly the same as when n = 2 and are left as an exercise. The corresponding 
result to Equation (2.10) states that if Xi , Xi_, ..., X n are n random variables, then for 
any n constants a \, aj, ■ ■ ■, a n . 


E[a\ X\ + 02 X 2 + • • ■ + a n X n ] — ai E[X{\ + aiE\X 2 \ + ■ • • + a n E[X n ] (2.11) 




44 


Introduction to Probability Models 


Example 2.29 Calculate the expected sum obtained when three fair dice are rolled. 

Solution: Let X denote the sum obtained. Then X = X[ + X 2 + X 3 where X, 
represents the value of the z'th die. Thus, 

E[X] = E[X 1 ] + E[X 2 \ + E[X 3 ] = 3 0 = | ■ 

Example 2.30 As another example of the usefulness of Equation (2.11), let us use 
it to obtain the expectation of a binomial random variable having parameters n and p. 
Recalling that such a random variable X represents the number of successes in n trials 
when each trial has probability p of being a success, we have 

X = X l +X 2 + --- + X n 


where 


1, 


0, 


if the i th trial is a success 
if the i th trial is a failure 


Hence, X, is a Bernoulli random variable having expectation E[X{] = I ( p) + 
0(1 - p) = p. Thus, 


E[X] = £[Xi] + E[X 2 ] + • • • + £[*„] = np 

This derivation should be compared with the one presented in Example 2.17. ■ 

Example 2.31 At a party N men throw their hats into the center of a room. The hats 
are mixed up and each man randomly selects one. Find the expected number of men 
who select their own hats. 

Solution: Letting X denote the number of men that select their own hats, we can 
best compute E[X\ by noting that 


x = X\ + x 2 + • • • + x N 


where 


Xt = 


1 , 

0 , 


if the z'th man selects his own hat 
otherwise 


Now, because the i th man is equally likely to select any of the N hats, it follows that 

1 

P{X, = 1} = P{z'th man selects his own hat} = — 


and so 

E[Xi] = 1 P{Xi = 1} + 0 P{Xi = 0} = 1 
Hence, from Equation (2.1 1) we obtain 

E[X] = E[Xi] + • • • + /'-1 A' \ 1 = ( -'7 ) N = 1 


Hence, no matter how many people are at the party, on the average exactly one of 
the men will select his own hat. ■ 
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Example 2.32 Suppose there are 25 different types of coupons and suppose that each 
time one obtains a coupon, it is equally likely to be any one of the 25 types. Compute 
the expected number of different types that are contained in a set of 10 coupons. 

Solution: Let X denote the number of different types in the set of 10 coupons. We 
compute E[X] by using the representation 


X = X! + • • • + X 25 


where 


Xi = 


1 , 

0 , 


if at least one type i coupon is in the set of 10 
otherwise 


Now, 


E[Xt] = P{Xj = 1} 

= P{at least one type i coupon is in the set of 10} 
= 1 — /’{no type i coupons are in the set of 10} 



when the last equality follows since each of the 10 coupons will (independently) not 
be a type i with probability . Hence, 

E[X] = E[Xi] + • ■ ■ + E[X 25 ] = 25 [l - (|) 10 ] ■ 


2.5.2 Independent Random Variables 

The random variables X and Y are said to be independent if, for all a, b, 

P{X < a,Y < b} = P{X < a}P{Y < b} (2.12) 

In other words, X and Y are independent if, for all a and b , the events E a = {X < a } 
and Fb = {Y < b] are independent. 

In terms of the joint distribution function F of X and Y, we have that X and Y are 
independent if 

F{a, b) = Fx(a)FY(b) for all a, b 
When X and Y are discrete, the condition of independence reduces to 
p(x, y) = p x (x)p Y (y ) (2.13) 

while if X and Y are jointly continuous, independence reduces to 


fix, y) = fx(x)f Y {y) 


(2.14) 
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To prove this statement, consider first the discrete version, and suppose that the joint 
probability mass function p(x, y) satisfies Equation (2.13). Then 

P{X <a,Y <b} = EE p(x, y ) 

y<bx<a 

= EE Px(x)pY(y) 

y<bx<a 

= E^wEww 

y<b x<a 

= P{Y < b}P{X < a} 


and so X and Y are independent. That Equation (2.14) implies independence in the 
continuous case is proven in the same manner and is left as an exercise. 

An important result concerning independence is the following. 

Proposition 2.3 If X and Y are independent, then for any functions h and g 
E[g(X)h(Y)] = E[g(X)]E[h(Y)] 


Proof. Suppose that X and Y are jointly continuous. Then 


E[g(X)h(Y)] = r f 
J — OO J - 

-n 


g(x)h(y)f (x, y)dxdy 
g(x)h(y)fx(x)f Y (y) dx dy 


L 


—oo J —oo 
oo 


/ oo 

g(x)fx(x)dx 

-OO 

= E[h(Y)]E[g(X)] 


The proof in the discrete case is similar. 


2.5.3 Covariance and Variance of Sums of Random Variables 

The covariance of any two random variables X and Y, denoted by Cov( A, Y), is 
defined by 

Cov(A, Y) = E[(X - E[X])(Y - E[Y ])] 

= E[XY - YE[X] - XE[Y] + E[X]E[Y}] 

= E[XY] - E[Y]E[X] - E[X]E[Y] + E[X]E[Y] 

= E[XY] - E[X]E[Y] 

Note that if X and Y are independent, then by Proposition 2.3 it follows that 
Cov(A, Y) = 0. 




Random Variables 


47 


Let us consider now the special case where X and Y are indicator variables for 
whether or not the events A and B occur. That is, for events A and B, define 


1, if B occurs 
0, otherwise 

Then, 


1, if A occurs 
0, otherwise, 


Cov(X Y) = E[XY] - E[X]E[Y] 

and, because XY will equal 1 or 0 depending on whether or not both X and Y equal 1, 
we see that 


Cov(X Y) = P{X = 1, Y = 1} - P{X = 1}P{T = 1} 


From this we see that 


Cov(X, Y) > 0 o P{X = 1, Y = 1} > P{X = 1}P{T = 1} 

P{X = 1, Y = 1} 

—---- > P{Y = 1} 

P{X=1} 

& P{Y = 1|X = 1} > P{Y = 1} 


That is, the covariance of X and Y is positive if the outcome X = 1 makes it more 
likely that Y = 1 (which, as is easily seen by symmetry, also implies the reverse). 

In general it can be shown that a positive value of Cov(X, Y) is an indication that Y 
tends to increase as X does, whereas a negative value indicates that Y tends to decrease 
as X increases. 

Example 2.33 The joint density function of X, Y is 

f(x, y) = — e _ (. v+A '/y) ; 0 < x, v < oo 

y 


(a) Verify that the preceding is a joint density function. 

(b) Find Cov(X, Y). 

Solution: To show that f(x,y) is a joint density function we need to show it is 
nonnegative, which is immediate, and that f (x, y ) dy clx = 1. We prove 

the latter as follows: 


/ OO P OO POO POO 1 

/ fi.x,y)dydx— I / —e~ (y+x ^ y ^dydx 

-oo J—oo Jo Jo y 

POO P OO 1 

= / e~ y I -e~ x ! y dx dy 

Jo Jo y 

POO 

= I e~ y dy 

Jo 


= 1 
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To obtain Cov(X, Y), note that the density funtion of Y is 


fy(y) 


= e -y 


r°° 1 

Jo y 


-e~ x/y dx = 


Thus, Y is an exponential random variable with parameter 1, showing (see 
Example 2.21) that 

E[Y] = 1 

We compute E[X] and E[XY] as follows: 


E[X] = 


-n 

J —00 J — oo 

- f‘- y / 


xf(x, y ) dy dx 


—e ' ' y dx dy 

y 


Now, / 0 °° |e X ' y dx is the expected value of an exponential random variable with 
parameter 1 /y, and thus is equal to y. Consequently, 


E[X] = 


f 


ye y dy = 1 


Also 


E[XY] = 


/ OO COO 

/ xy f{x, y) dy dx 

-oo J —OO 
COO COO „ 

= / ye~ y I —e~ x / y dx dy 

Jo Jo y 

-f 


y~e y dy 

Integration by parts ( dv = e~ y dy, u = y 2 ) gives 

COO COO 

E[XY] = / y 2 e~ y dy = -y 2 e~ y + / 2 ye~ y dy = 2 E[Y] = 2 

Jo Jo 

Consequently, 

Cov(X, Y ) = E[XY] - E[X]E[Y] = 1 
The following are important properties of covariance. 


Properties of Covariance 

For any random variables X, Y, Z and constant c, 

1. Cov(X, X) = Var(X), 

2. Cov(X, Y) = Cov(T, X), 
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3. Cov(cX, Y) = c Cov(X, Y), 

4. Cov(X, Y + Z) = Cov(X, Y) + Cov(X, Z). 

Whereas the first three properties are immediate, the final one is easily proven as 
follows: 


Cov(X, Y + Z) = E[X(Y + Z)] - E[X]E[Y + Z] 

= E[XY] - E[X]E[Y] + E[XZ] - E[X]E[Z] 
= Cov(X, Y) + Cov(X, Z) 


The fourth property listed easily generalizes to give the following result: 


aw 


,i=i 


j =i 


n m 


EE Co \(Xi,Yj) 

i =1 7=1 


(2.15) 


A useful expression for the variance of the sum of random variables can be obtained 
from Equation (2.15) as follows: 


Var 



n n 

EE Co \(Xt, Xj ) 

i =1 j =1 

n n 

J2 Co v(Xi,Xi) + EE Cov (Xj, Xj) 

i =1 i=l j+i 


n n 

= Var (Xi) + 2J2Y Cov ( x >’ x j) ( 2 - 16 ) 

i =1 i=l7</ 

If A;, i = are independent random variables, then Equation (2.16) reduces to 


( n \ n 

E* =E Var (A,) 

i=l / i=i 


Definition 2.1 If Xi.A„ are independent and identically distributed, then the 

random variable X = x i! n ' s called the sample mean. 

The following proposition shows that the covariance between the sample mean and 
a deviation from that sample mean is zero. It will be needed in Section 2.6.1. 

Proposition 2.4 Suppose that X \,..., X n are independent and identically distributed 
with expected value ji and variance a 2 . Then, 

(a) E[X] = /x. 

(b) Var(Z) = a 2 /n. 

(c) Cov(Z, Xi - Z) = 0, i = 1,..., n. 
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Proof. Parts (a) and (b) are easily established as follows: 



n z —' 
i=1 


Var(V) = 



a 


n 


2 


To establish part (c) we reason as follows: 


Cov(V, X, -X) = Cov(X, Xi) - Cov(V, X) 



a 


n 


2 


n n 


where the final equality used the fact that X, and Xj are independent and thus 
have covariance 0. ■ 

Equation (2.16) is often useful when computing variances. 

Example 2.34 (Variance of a Binomial Random Variable) Compute the variance 
of a binomial random variable X with parameters n and p. 

Solution: Since such a random variable represents the number of successes in n 
independent trials when each trial has a common probability p of being a success, 
we may write 


X = Xi+--- + X, 


n 


where the X, are independent Bernoulli random variables such that 


1, if the i th trial is a success 
0, otherwise 


Hence, from Equation (2.16) we obtain 


Var(V) = Var(Vi) + • • • + Var(X„) 


But 


Var(AY) = E[Xf] - (£[V,]) 2 

= E[Xj] - (E[Xj]) 2 since X? = X, 
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and thus 


Var(X) = np{ 1 — p) 


Example 2.35 (Sampling from a Finite Population: The Hypergeometric) Con¬ 
sider a population of N individuals, some of whom are in favor of a certain proposition. 
In particular suppose that Np of them are in favor and N — Np are opposed, where 
p is assumed to be unknown. We are interested in estimating p, the fraction of the 
population that is for the proposition, by randomly choosing and then determining the 
positions of n members of the population. 

In such situations as described in the preceding, it is common to use the fraction of 
the sampled population that is in favor of the proposition as an estimator of p. Hence, 
if we let 


Xi = 


1, 

0 , 


if the i th person chosen is in favor 
otherwise 


then the usual estimator of p is X^/=i X, /n. Let us now compute its mean and variance. 
Now, 


E* 

Li'=l J 


1 

= np 


where the final equality follows since the /th person chosen is equally likely to be any 
of the N individuals in the population and so has probability Np/N of being in favor. 

( n \ n 

E* = £ Var(X ( ) + 2^ ^Cov(W, X/) 

1 / 1 i<j 

Now, since X, is a Bernoulli random variable with mean p, it follows that 


Var(Xj) = p( 1 - p) 
Also, for i ^ j, 


Cov(Xj, Xj ) = E[XiXj] - E[Xi]E[Xj] 

= P{Xi = 1, Xj = 1 }-p 2 
= P{Xi = l}P{Xj = l\Xi = 1} - p 2 
= Np (Np - 1) 2 

N N - 1 P 


where the last equality follows since if the i th person to be chosen is in favor, then the 
j th person chosen is equally likely to be any of the other N — 1 of which Np — 1 are 
in favor. Thus, we see that 


Var E* 


= np( 1 - p) + 2 



p(Np - 1) 2 

-ITTl ~~ P 


= np{\ - p) - 


n(n - \)p{\ - p) 


N - 1 
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and so the mean and variance of our estimator are given by 


E 

yXi 
_ , n . 

= p. 


Var 

-M 3 

3 

_1 

p (i - p) 

(n - \)p{\ - p) 

n 

n(N - 1) 


Some remarks are in order: As the mean of the estimator is the unknown value p, we 
would like its variance to be as small as possible (why is this?), and we see by the 
preceding that, as a function of the population size N, the variance increases as N 
increases. The limiting value, as N oo, of the variance is p( 1 — p)/n, which is 
not surprising since for N large each of the X, will be (approximately) independent 
random variables, and thus A, will have an (approximately) binomial distribution 
with parameters n and p. 

The random variable X, can be thought of as representing the number of white 
balls obtained when n balls are randomly selected from a population consisting of 
Np white and N — Np black balls. (Identify a person who favors the proposition 
with a white ball and one against with a black ball.) Such a random variable is called 
hypergeometric and has a probability mass function given by 


P 


T, x ‘ = k 



It is often important to be able to calculate the distribution of X + Y from the 
distributions of X and Y when X and Y are independent. Suppose first that X and Y are 
continuous, X having probability density / and Y having probability density g. Then, 
letting Fx+y(u) be the cumulative distribution function of X + T, we have 


F x+Y (a) = P{X + Y <«} 


f(x)g(y)dxdy 

x+y<a 
oo pa—y 

f{x)g{y)dx dy 


= fl 

-CL 

- r (/“ 

-f 


f(x)dx \ g(y)dy 
Fx(a - y)g(y)dy 


(2.17) 


The cumulative distribution function F x + Y is called the convolution of the distributions 
Fx and Fy (the cumulative distribution functions of X and Y, respectively). 










Random Variables 


53 


By differentiating Equation (2.17), we obtain that the probability density function 
fx+Y (fl) of X + Y is given by 


d r°° 

fx+Y (a) = — / F x (a - y)g(y) dy 
da J-oo 


d 

/ —( p x(a - y))g(y)dy 
J —C 


-00 

"OO 


da 


= f f(a-y)g(y)dy 

J —OO 


(2.18) 


Example 2.36 (Sum of Two Independent Uniform Random Variables) If X and Y 

are independent random variables both uniformly distributed on (0, 1), then calculate 
the probability density of X + Y. 

Solution: From Equation (2.18), since 


f(a) = g(a) = 


1, 

0, 


0 < a < 1 
otherwise 


we obtain 


fx+Y {a) = f 

Jo 


f{a-y)dy 


ForO < a < 1, this yields 


fx+Y (a) = dy = a 
Jo 

For 1 < a < 2, we get 

fx+Y (a) = dy = 2 - a 
Ja -1 

Hence, 


fx+Y (a) = 


a , 

2 — a, 

0 , 


0 < a < 1 

1 < a < 2 ■ 

otherwise 


Rather than deriving a general expression for the distribution of X + Y in the discrete 
case, we shall consider an example. 

Example 2.37 (Sums of Independent Poisson Random Variables) Fet X and Y be 

independent Poisson random variables with respective means Ai and 7.2. Calculate the 
distribution of X + Y. 
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Solution: Since the event {X + Y — n] may be written as the union of the disjoint 
events {X — k,Y — n — k}. 0 < k < n, we have 


P{X + Y = n}= ^2 p i x = k,Y = n — k} 


k =0 


= y p i x = ki > p i Y = n - v 

k =0 


= 

Y-^ b\ 


, X n ~ k 


-k-2 /V 2 


k=0 


— ^-(A.l+A.2) 


k\ (n - k)l 
n ih'i-t’ 

/V | /V r* 


Y-^ l-\(r 


k =0 


k\{n-k)\ 


„t 

- 1 a 2 


E 


n! y^k\(n — k)\ 

-(A.i+A.2 ) 

—:- Ul+A 2 )" 


In words, X\ + X 2 has a Poisson distribution with mean /, ] + k 2 . H 

The concept of independence may, of course, be defined for more than two random 
variables. In general, the n random variables Xi, X 2 ,..., X n are said to be independent 
if, for all values a\, a 2 ,..., a n , 


P{Xt < ai, X 2 < a 2 , ■ ■ ., X n < a,,} 

= p { x i < fli}P{^2 < a 2 } ■ ■ ■ P{X n < a n } 


Example 2.38 Let X \,..., X n be independent and identically distributed continuous 
random variables with probability distribution F and density function F' — f. If we let 
X(,) denote the ;th smallest of these random variables, then Aj |),..., X( ni are called 
the order statistics. To obtain the distribution of X^, note that X {i) will be less than or 
equal to x if and only if at least i of the n random variables X \,..., X„ are less than 
or equal to x. Hence, 


P{X<i) - x} = E (l) - p M) n ~ k 
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Differentiation yields that the density function of is as follows: 


fx u] (x) = fix) J2 Q k{F{x)) k -\\ - F(x)) n ~ k 

-fix) J2 Q (n - k)iFix)) k i 1 - Fix)) n - k ~ l 




72! 


(FW) w (l-FW) 


n-yfc 


k=i 
n— 1 


-/wE 


(n-£)!(£- 1)! 
n\ 


k=i 


:(n-k- 1 )!£! 


(F(x)/(1 - Fix)) n ~ k - { 


= /(x) £ 773 


7! 




&=i 


(n-£)!(£- 1)! 


n—k 


fix) V -—-(F(x)V _1 (l - F(x)) n ~ j 

^ in - j)Hj - l)r 


j=i +1 
n\ 


fix)iFix))‘- l il- Fix)) 


(n -;')!(/-!)! 


The preceding density is quite intuitive, since in order for Z(,-) to equal x, i — 1 of the 
n values Xi,..., X n must be less than x\n — i of them must be greater than x; and 
one must be equal to x. Now, the probability density that every member of a specified 
set of i — 1 of the X j is less than x, every member of another specified set of n — i is 
greater than x, and the remaining value is equal to x is (F(x)) ! ~* (1 — F(x))" _! fix). 
Therefore, since there are «!/[(; — 1)!(/; — i)!] different partitions of the n random 
variables into the three groups, we obtain the preceding density function. ■ 


2.5.4 Joint Probability Distribution of Functions of Random Variables 

Let X[ and X 2 be jointly continuous random variables with joint probability density 
function fix i,X 2 ). It is sometimes necessary to obtain the joint distribution of the 
random variables Y 1 and Yn that arise as functions of X\ and Xj. Specifically, suppose 
that Y\ = g](X\, X 2 ) and Yn = gi(X 1 ■ Xn_) for some functions g\ and gj- 
Assume that the functions g 1 and gj satisfy the following conditions: 

1. The equations yi = £1 (xi, xj) and yj = giixi , X 2 ) can he uniquely solved forxi 
and X 2 in terms of y\ and V 2 with solutions given by, say, xj = h\ (vi, y 2 ), X 2 = 
h2(yi,y2)- 
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2. The functions g i and g 2 have continuous partial derivatives at all points (x \, x' 2 ) 
and are such that the following 2x2 determinant 

9gt 9gt 

dgi_ 9g2 _ dgl_ dg2 Q 

3xi 3x2 9x2 9xi 

3xi 3x2 

at all points (xi, X 2 ). 

Under these two conditions it can be shown that the random variables Y[ and Y 2 are 
jointly continuous with joint density function given by 

fYi,Y 2 (yi,y 2 ) = fXi,X 2 (Xl,X2)\J(Xl,X2)\~ X (2.19) 


J(x i,x 2 ) = 


3xi 3x2 
dg2 3g2 


where xi = h\{y\, y 2 ), x 2 = h 2 (yi,y 2 )- 

A proof of Equation (2.19) would proceed along the following lines: 


P{Y i <yi,F 2 < yi} = 



fx u x 2 (x\ , X 2 ) dx] dx2 


(x\,x 2 ) : 

81 (xi, x 2 ) < yi 
82(.xi,x 2 ) < y 2 


( 2 . 20 ) 


The joint density function can now be obtained by differentiating Equation (2.20) with 
respect to y\ and y 2 . That the result of this differentiation will be equal to the right-hand 
side of Equation (2.19) is an exercise in advanced calculus whose proof will not be 
presented in the present text. 

Example 2.39 If X and Y are independent gamma random variables with parameters 
(a, X) and (fi /.), respectively, compute the joint density of U = X + Y and V = 
X/(X + Y). 

Solution: The joint density of X and Y is given by 


fx,r(x, y) 


Xe~ kx (A.x)“ _1 ke~ x y (ky)?- 1 

r(«) ro?) 

— -g—A.(jr+y) a— t 1 

r(a)r(/3) J 


Now, if g i(x, y) = x + y, g 2 (x, y) = x/(x + y), then 


9gt _ 9gt 
dx 3 v 

and so 


9g 2 = y 9g2 

3x (x + y) 2 ’ 3 y 


J(x,y ) 


1 1 

y -x 


l 


X 

(x + y) 2 


(x + y) 2 (x + y) 2 


x + y 
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Finally, because the equations u = x + y, v = x/(x + y) have as their solutions 
x = uv, y = u{ 1 — i>), we see that 


fu,v(u, v) = fx,y[uv, u( 1 - v)]u 

_ Xe- ku (Xu) a+fi -' u“ _1 (l - v)P~ l r(ci + 0) 

r (a + 0) r(a)r(yS) 

Hence X+Y and X/(X+Y) are independent, with X+Y having a gamma distribution 
with parameters (a + fi, X) and X/(X + Y) having density function 


fvW) = 


Yja+Ji) 

r(a)V(P) 


v a -\\-v) p ~ l . 


0 < v < 1 


This is called the beta density with parameters (a, /3). 


This result is quite interesting. For suppose there are n + m jobs to be performed, 
with each (independently) taking an exponential amount of time with rate X for per¬ 
formance, and suppose that we have two workers to perform these jobs. Worker I will 
do jobs 1,2and worker II will do the remaining m jobs. If we let X and Y 
denote the total working times of workers I and II, respectively, then upon using the 
preceding result it follows that X and Y will be independent gamma random variables 
having parameters (n, X) and (m , X), respectively. Then the preceding result yields that 
independently of the working time needed to complete all n + m jobs (that is, of X + Y ), 
the proportion of this work that will be performed by worker I has a beta distribution 
with parameters ( n,m ). ■ 

When the joint density function of the n random variables X\, X 2 ,..., X n is given 
and we want to compute the joint density function of Fi, Y 2 ,..., Y n , where 


Yi = gi(X u ...,X n ), Y 2 = g 2 (X u ...,X n ), ..., 

Y n =gn(X 1 ,...,X n ) 

the approach is the same. Namely, we assume that the functions g/ have continuous 
partial derivatives and that the Jacobian determinant J(x 1 ,... ,x n ) ^ 0 at all points 
(xi,..., x„), where 


3gt 

dgl 

3gt 

3xi 

3x2 

3 Xn 

dg2 

dg2 

3 g2 

dxi 

3x2 

9x„ 

dgn 

3 gn 

3 gn 

dxi 

3x2 

dx„ 


Furthermore, we suppose that the equations yi = gi(xi,..., x„ ), y 2 = g 2 (x 1 ,..., x n ), 
..., y„ = g n (x 1 , ..., x„) have a unique solution, say, xi = h\{yi, ..., y n ), ..., 
x n = h n (y 1 ,..., y„). Under these assumptions the joint density function of the random 
variables T; is given by 

fr 1 . y n (yi — ,y n ) = fx u ...,x n (xi, ...,x n )\J(x\, ...,x„)r 1 

where x, = hjiy 1 , ..., y n ), i = 1,2,..., n. 
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2.6 Moment Generating Functions 

The moment generating function </>(?) of the random variable X is defined for all values 
t by 


HO = E[e tx ] 


' - ^ 

f e tx p(x), if X is discrete 


r 


e tx f(x ) dx , if X is continuous 


We call </> it) the moment generating function because all of the moments of X can be 
obtained by successively differentiating (pit). For example, 


f\t) = —E[e ,x ] 
at 


= E 




= E[Xe ,x ] 


Hence, 

</>'( 0) = E[X] 
Similarly, 


<p"(t) = —4/{t) 
at 

= —E[Xe ,x ] 
dt 


= E 


’ (AV- V ) 

dt 


= E[X 2 e ,x ] 


and so 


</>"( 0) = E[X l ] 

In general, the /7th derivative of (pit) evaluated at t = 0 equals E[X n \, that is, 
</>"(0) = E[X n ], n> 1 


We now compute (pit) for some common distributions. 
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Example 2.40 (The Binomial Distribution with Parameters n and p) 

4m, = F.\/ y \ 

= !>'* ft) 

it=0 v 7 

= E(lWa-P)"-* 

k =0 V 7 

= + 1 - p)" 


Hence, 

(j)\t) = n(pe‘ + 1 - p) n ~ x pe' 
and so 

E[X] = <p\ 0) = np 

which checks with the result obtained in Example 2.17. Differentiating a second time 
yields 

4>"(t) = n(n - 1 )(pe' + 1 - /?)" _2 (pe ? ) 2 + n(pe f + 1 - p) n ~ l pe' 
and so 

E[X 2 ] = (j)"{ 0) = n(n — 1) p 2 + np 

Thus, the variance of X is given by 

Var(X) = E[X 2 ] - (£[X]) 2 

= n(n — 1 )p 2 + np — n 2 p 2 

= np(l — p) ■ 


Example 2.41 (The Poisson Distribution with Mean X) 


0(0 = E[e tx ] 


= E 


°° e ,n e~ k \ n 


12=0 


(Xe') n 


= ^E 

n=0 

= 

= exp{X(e' - 1)} 
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Differentiation yields 

<p'(t) = Xe' exp{A(e ? — 1)}, 

cp"(t) = (Xe 1 ) 2 exp{A(e f — 1)} + Xe' exp{l(e r — 1)} 
and so 


E[X] = </>'(0) = X, 

E[X 2 ] = </>"(0) — X 2 + X, 
Var(X) = E[X 2 } - ( E[X ]) 2 
= X 


Thus, both the mean and the variance of the Poisson equal X. 

Example 2.42 (The Exponential Distribution with Parameter X) 

4>(f ) = E[e' X ] 


r 

Jo 


e tx Xe ,x dx 


p OO 

= X I e~^~^ x dx 
Jo 
X 


for t < X 

X - t 

We note by the preceding derivation that, for the exponential distribution, <j) ( t ) is only 
defined for values of t less than X. Differentiation of (pit) yields 

X ..... 21 


4>'(t) = 


(x -1 ) 2 ’ 


0"(O = 


(X - f)3 


Hence, 


E[X]=<t>\ 0) = i E[X 2 ] = 0"(O) = ^ 

The variance of X is thus given by 


Var(X) = E[X 2 ] - (E[X]) Z = ^ ■ 

Example 2.43 (The Normal Distribution with Parameters ft and <r 2 ) The moment 
generating function of a standard normal random variable Z is obtained as follows. 

"OO 


E[e' z ] = 


= —[ 

~j2jt J- 


e ,x e~ x /2 dx 


— OO 
OO 


= — ! 
\[2ix J— 


e -^- 2tX )l2 dx 

e-<*-t?l* dx 


= e ' 2 ' 2 
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If Z is a standard normal, then X = aZ + p. is normal with parameters p and er 2 ; 
therefore. 


<j>(t) = E[e' x ] = E[e r(aZ+fl) ] = e’^E[e ,aZ ] = exp 


a 2 r 2 ) 
— +^\ 


By differentiating we obtain 
= (p + ta 2 ) exp 
<p"{t) = (/x + ta 2 ) 2 exp 
and so 


a t~ 

— +,Mt }’ 
\a 2 t 2 } 

^ +/Xr l 


+ a 2 exp 


a 2 t 2 ) 

— + n 


E[X] = 4>'(0) = At, 

£[X 2 ] = 0 "(O) = /x 2 + o - 2 

implying that 

Var(X) = E[X 2 ] - £([X]) 2 

= a 2 m 

Tables 2.1 and 2.2 give the moment generating function for some common distribu¬ 
tions. 

An important property of moment generating functions is that the moment generating 
function of the sum of independent random variables is just the product of the individual 
moment generating functions. To see this, suppose that X and Y are independent and 


Table 2.1 


Discrete 

Probability 

Moment 



probability 

mass 

generating 



distribution 

function, p(x) 

function, 

Mean 

Variance 

Binomial with 

( n x )p x (.l-p) n ~ x , 

{pe‘ + (1 - p)) n 

np 

np{ 1 - p) 

parameters n, p. 

0 < p < 1 

x = 0, 1 ,,n 




Poisson with 

1 

exp{A(e ? — 1)| 

X 

X 

parameter 

X > 0 

x = 0, 1,2,... 




Geometric with 

Pd - p) x ~K 

pe< 

1 

1 ~P 

1 - (1 - P )e< 

p 

p2 

parameter 

0 < p < 1 

x — 1,2,... 
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Table 2.2 

Continuous Moment 

probability Probability density generating 

distribution function, /(x) function, 0 (t) Mean Variance 


Uniform over (a, b ) 


Exponential with 
parameter X > 0 

Gamma with 
parameters 
(n, X), X > 0 

Normal with 
parameters 

id, o' 2 ) 


Jb 


f CO = 
/(*) = 

fix) = 

/« = 


a + b {b — a)- 


b — a 

0, otherwise 


t(b — a) 

2 

12 

\e~ Xx , x > 0 


X 

1 

1 

0, x < 0 


X - t 

X 

X 2 

' Xe~ Xx (Xx) n ~ l 

in - D! ’ * 
0, x 

IV 

o 

( x V 

n 

n 

< 0 

U -t) 

X 

X 2 

1 

exp < 

f oh 2 ) 

dt + — r- \ 

d 

a 2 


x exp{ — (x — p.) 2 /2o 2 }, 
— OO < X < oo 


have moment generating functions <px(t) and 0y(f), respectively. Then c/ty+y (f), the 
moment generating function of X + Y, is given by 

<t>x+Yit) = E[e t(x+Y) ] 

= E[e ,x e ,Y ] 

= E[e tX ]E[e ,Y ] 

= <Px(t)<pY(t) 

where the next to the last equality follows from Proposition 2.3 since X and Y are 
independent. 

Another important result is that the moment generating function uniquely determines 
the distribution. That is, there exists a one-to-one correspondence between the moment 
generating function and the distribution function of a random variable. 

Example 2.44 (Sums of Independent Binomial Random Variables) If X and Y 

are independent binomial random variables with parameters (n, p) and (m, p), respec¬ 
tively, then what is the distribution of X + Y ? 

Solution: The moment generating function of X + Y is given by 

<fix+Y(t) = fxmvit) = (pe' + 1 - pfipe’ + 1 - p) m 
= ( pe' + 1 - p) m+n 


But ( pe 1 + (1 — p) )"' + " is just the moment generating function of a binomial random 
variable having parameters m + n and p. Thus, this must be the distribution of 
X + Y. m 
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Example 2.45 (Sums of Independent Poisson Random Variables) Calculate the 
distribution of X + Y when X and Y are independent Poisson random variables with 
means /.| and X 2 , respectively. 

Solution: 

<Px+Y(t) = (px(t)<pY(t) 

= g*l(e'-l)g*2(e , -l) 

_ e (M+A 2 )(e I -l) 

Hence, X + Y is Poisson distributed with mean + A. 2 , verifying the result given 
in Example 2.37. ■ 

Example 2.46 (Sums of Independent Normal Random Variables) Show that if X 
and Y are independent normal random variables with parameters (/x 1 , a 2 ) and (n 2 , cry), 
respectively, then X + Y is normal with mean hi + hi and variance af + ab¬ 
solution: 


<Px+y(t) — (px(t)4>Y(t) 


= exp 


aft 2 


+ mt 1 exp 


a 2 2 r 2 


M2 1 


2\ t 2 


exp 


(ap + er 2 2 )t 


+ (Ml + M 2 ) f 


which is the moment generating function of a normal random variable with mean 
Hi + H 2 an d variance a 2 + rr(. Hence, the result follows since the moment gene¬ 
rating function uniquely determines the distribution. ■ 

Example 2.47 (The Poisson Paradigm) We showed in Section 2.2.4 that the number 
of successes that occur in n independent trials, each of which results in a success with 
probability p is, when n is large and p small, approximately a Poisson random variable 
with parameter X = rip. This result, however, can be substantially strengthened. First 
it is not necessary that the trials have the same success probability, only that all the 
success probabilities are small. To see that this is the case, suppose that the trials 
are independent, with trial ; resulting in a success with probability p, , where all the 
Pi, i = \ 11 are small. Letting X ,• equal 1 if trial i is a success, and 0 otherwise, it 
follows that the number of successes, call it X, can be expressed as 

1 = 1 

Using that X; is a Bernoulli (or binary) random variable, its moment generating func¬ 
tion is 


E[e tXi ] = p t e’ + 1 - pi = 1 + pi(e' - 1) 
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Now, using the result that, for |x| small, 

e x pa 1 + x 

it follows, because pfe' — 1) is small when /?, is small, that 

E[e tXi ] = 1 + pi(e' - 1) » exp { Pi (e' - 1)} 

Because the moment generating function of a sum of independent random variables is 
the product of their moment generating functions, the preceding implies that 

E[e tX ] « ]~ [ exp{ pi (e 1 - 1)} = exp I ^ pi(e' - 1) 
i'=l l i 

But the right side of the preceding is the moment generating function of a Poisson ran¬ 
dom variable with mean JT p ,, thus arguing that this is approximately the distribution 
of X. 

Not only is it not necessary for the trials to have the same success probability for 
the number of successes to approximately have a Poisson distribution, they need not 
even be independent, provided that their dependence is weak. For instance, recall the 
matching problem (Example 2.31) where n people randomly select hats from a set 
consisting of one hat from each person. By regarding the random selections of hats as 
constituting n trials, where we say that trial i is a success if person i chooses his or her 
own hat, it follows that, with A, being the event that trial i is a success, 

P(Ai) = - and P{Aj\Aj) = — j ^ i 

Hence, whereas the trials are not independent, their dependence appears, for large n, 
to be weak. Because of this weak dependence, and the small trial success probabilities, 
it would seem that the number of matches should approximately have a Poisson distri¬ 
bution with mean 1 when n is large, and this is shown to be the case in Example 3.23. 

The statement that “the number of successes in n trials that are either indepen¬ 
dent or at most weakly dependent is, when the trial success probabilities are all small, 
approximately a Poisson random variable” is known as the Poisson paradigm. ■ 

Remark For a nonnegative random variable X, it is often convenient to define its 
Laplace transform g(t), t > 0, by 

g(t) = <K-t) = E[e~ ,x ] 

That is, the Laplace transform evaluated at t is just the moment generating function 
evaluated at — t. The advantage of dealing with the Laplace transform, rather than the 
moment generating function, when the random variable is nonnegative is that if X > 0 
and t > 0, then 

0 < e~ ,x < 1 

That is, the Laplace transform is always between 0 and 1. As in the case of moment 
generating functions, it remains true that nonnegative random variables that have the 
same Laplace transform must also have the same distribution. ■ 
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It is also possible to define the joint moment generating function of two or more 

random variables. This is done as follows. For any n random variables X\ . X n , 

the joint moment generating function, <p(t\, ..., f„), is defined for all real values of 
by 


,(fiXH- \-t n X n ) 


(p(ti, ...,t n ) = E[e 


It can be shown that cp(t \, ..., t n ) uniquely determines the joint distribution of 


X 1 ,...,X n . 


Example 2.48 (The Multivariate Normal Distribution) Let Z\ ,..., Z„ be a set of n 

independent standard normal random variables. If, for some constants a,j , 

1 < i < m , 1 < j < n, and /-tj, 1 < i < m, 

X i = au_Z\ + ■ ■ ■ + a\ n Z n + fi \, 

Xi — ai\Z\ + ■ ■ ■ + «2 n Z n + iu, 2 , 

X i = Qj 1 Z\ T • • • T UinZ n fXi , 

X m — Cl hi | Z ] T ' ‘ ‘ T Cl/nn Z n + [l m 

then the random variables X i, ..., X m are said to have a multivariate normal distribution. 

It follows from the fact that the sum of independent normal random variables is 
itself a normal random variable that each X ; is a normal random variable with mean 
and variance given by 


E[Xi] = !M, 


n 


Var (Xt) = J2“fj 


Let us now determine 

0(?i,..., t m ) = EfexpjtiXi + ■ • • + t, u X m }] 

the joint moment generating function of X[ , ..., X m . The first thing to note is that since 
i Xi is itself a linear combination of the independent normal random variables 
Z\,. .. , Z„, it is also normally distributed. Its mean and variance are respectively 


m m 


E ^iXi — ^ v ^ E'i 


Li=l J i'=l 


and 



m m 


EE tjtjCov(Xj, Xj) 
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Now, if Y is a normal random variable with mean p and variance cr 2 , then 

E[e ¥ ] = 0y(f)lf=i = e» + ° 2 l 2 
Thus, we see that 


4>(h - ,t m ) = exp 


m ^ m m 

+ 9 itjCov(Xj , Xj) > 

i=l i=l 7=1 J 


which shows that the joint distribution of X \,..., X m is completely determined from 
a knowledge of the values of E[Xi] and Cov(X,-, Xj), i, j = 1,..., in. ■ 


2.6.1 The Joint Distribution of the Sample Mean and Sample Variance 
from a Normal Population 

Let X \, ..., X„ be independent and identically distributed random variables, each with 
mean p and variance a 2 . The random variable S 2 defined by 

^2 _ ~ X)~ 

~ ^ n - 1 

i=i 

is called the sample variance of these data. To compute LIS' 2 ] we use the identity 


n n 

(X, ~X) 2 = J2 (*« -p) 2 -n{X- p) 2 (2.21) 

i'=t i'=t 

which is proven as follows: 


J2 (x ‘ - x) 

i =1 


n 

J2 (Xi — p + p - X) 2 

i =1 


n n 

= J2 (Xi - /*> 2 + - x f +- *) J2 (Xi - /i) 

i =1 i =1 

n 

= (Xj — p) 2 + n(p — X) 2 + 2(p — X)(nX — np) 
1=1 
n 

= J2 {Xi - ^ 2 + "(A* - *) 2 - 2n(At - X) 2 

i=i 

and Identity (2.21) follows. 

Using Identity (2.21) gives 


n 

E[(n - 1)S 2 ] = J2 E l(Xi - p) 2 ] - nE[(X - p) 2 ] 

i=i 

= ncr 2 — n Var(X) 

= (n — l)cr 2 


from Proposition 2.4(b) 
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Thus, we obtain from the preceding that 
£[S 2 ] = cr 2 


We will now determine the joint distribution of the sample mean A = Y^i=i ^i / 11 
and the sample variance S 2 when the X, have a normal distribution. To begin we need 
the concept of a chi-squared random variable. 

Definition 2.2 If Z\, ...,Z n are independent standard normal random variables, 
then the random variable X7=i Z 2 is said to be a chi-squared random variable with n 
degrees of freedom. 

We shall now compute the moment generating function of Z 2 - To begin, note 
that 


£[exp{rZ 2 }] = _ f e ,x ~e x1 ^ 2 dx 

V 1.7T J —oo 



= cr 


= (1 - 2 r )“ 1/2 


where a 2 


0-2 rr 1 


Hence, 

n 

= n E[exp{tZ 2 }] = (1 - 2 t)~ n/2 
i =1 


ex p ? z f 


i=t 


Now, let Ai, ..., X n be independent normal random variables, each with mean /i and 
variance a 2 , and let X — l ,-/n and S 2 denote their sample mean and sample 
variance. Since the sum of independent normal random variables is also a normal 
random variable, it follows that X is a normal random variable with expected value // 
and variance a 2 In. In addition, from Proposition 2.4, 


Cov(A, Xi - X) = 0, i = 


( 2 . 22 ) 


Also, since X, X\ — X, X 2 — X, ..., X n — X are all linear combinations of the inde¬ 
pendent standard normal random variables (A, — fi)/ a, i = 1,it follows that 
the random variables X, X\ — X, X 2 — X ,..., X n — X have a joint distribution that 
is multivariate normal. However, if we let Y be a normal random variable with mean 
H and variance a 2 /n that is independent of X \,... .X n , then the random variables 
Y, X 1 — A, X 2 — X ,..., X n — A also have a multivariate normal distribution, and by 
Equation (2.22), they have the same expected values and covariances as the random 
variables A, A, — A,; = 1Thus, since a multivariate normal distribution is 
completely determined by its expected values and covariances, we can conclude that the 
random vectors Y, Ai - A, A 2 - A, ..., A„ - A and A, Ai — A, X 2 — A,..., X n — A 
have the same joint distribution; thus showing that A is independent of the sequence 
of deviations A, — A, i = l,... ,n. 
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Since X is independent of the sequence of deviations Xj — X, i = 1it 
follows that it is also independent of the sample variance 

Sl = ± {1 ^ 


To determine the distribution of .S' 2 , use Identity (2.21) to obtain 

n 

(n - 1)S 2 = ( x i - M) 2 - n(X - /x) 2 

i=i 


Dividing both sides of this equation by a 2 yields 


(n - 1)S 2 


(X — /x\ _ (Xi — /x)“ 

\°/y/n) ~1 ° 2 


(2.23) 


Now, t (^! ~ /x) 2 /cr 2 is the sum of the squares of n independent standard normal 
random variables, and so is a chi-squared random variable with n degrees of freedom; 
it thus has moment generating function (1 — 2 t )~ n / 2 . Also [(X — /x)/(cr/^/TF)] 2 is the 
square of a standard normal random variable and so is a chi-squared random variable 
with one degree of freedom; it thus has moment generating function (1 — 2 t )~ l/ 2 . In 
addition, we have previously seen that the two random variables on the left side of 
Equation (2.23) are independent. Therefore, because the moment generating function 
of the sum of independent random variables is equal to the product of their individual 
moment generating functions, we obtain that 

E \ e t{n ~ l)s2/(j2 ]{\ - 2 t )~ 1/2 = (1 - It )-" 12 


or 


E^n-l)S 2 /a 2 ] = (1 _ 2t) -(n- 1)/2 

But because (1 — 2r) _< "“ 1 ’/ 2 is the moment generating function of a chi-squared random 
variable with n — 1 degrees of freedom, we can conclude, since the moment generating 
function uniquely determines the distribution of the random variable, that this is the 
distribution of (n — 1) 5" 2 /cr 2 . 

Summing up, we have shown the following. 

Proposition 2.5 If X \,..., X n are independent and identically distributed normal 
random variables with mean // and variance a 2 , then the sample mean X and the sample 
variance S 2 are independent. A is a normal random variable with mean /x and variance 
er 2 /n; (n — 1 ) S 2 /a 2 is a chi-squared random variable with n — 1 degrees of freedom. 
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2.7 The Distribution of the Number of Events that Occur 

Consider arbitrary events A\, ..., A„, and let X denote the number of these events that 
occur. We will determine the probability mass function ofX. To begin, fori < k < n, let 

S k = E P(A h ...A ik ) 

equal the sum of the probabilities of all the ^ j intersections of k distinct events, and 
note that the inclusion-exclusion identity states that 

P(X > 0) = P( U" =1 Ai) = S 1 -S 2 + S 3 ---- + (-1 )' ,+l S n 

Now, fix k of the n events — say A,-,, ..., Aj k — and let 

A = rfj =l Aij 

be the event that all k of these events occur. Also, let 

B = H j$[i 1 ,...,i k }A C j 

be the event that none of the other n — k events occur. Consequently, A B is the event 
that A (1 , ..., Aj t are the only events to occur. Because 

A = AB U AB C 

we have 

P{A) = P{AB) + P(AB C ) 
or, equivalently, 

P(AB) = P{A) - P(AB C ) 

Because B c occurs if at least one of the events A ,•, j £ {/1 , ..., 4), occur, we see that 

B = UjYb'i. ik)Aj 

Thus, 

P(AB ) = P(AUjg[i 1 ^ ^i k ] Aj ) = P( AAj) 

Applying the inclusion-exclusion identity gives 

P(AB C ) = J2 p ( AA j)~ E p ( AA h A h> 

+ E P ( AA jt A h A h) ~ ■ ■ ■ 
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Using that A = fl* =1 A,v, the preceding shows that the probability that the k events 
Ai { ,, Ai k are the only events to occur is 


P(A)~ P{AB C ) = P( Ail ...A ik )- P{A h ...A ik Aj) 

j&tii . 41 

+ El P(Ai l ... Ai k Aj { Aj 2 ) 

El ■ ■ ■ Ai k Aj l Aj 2 Aj 3 ) + • • • 

71 <J2<73^{«1 


Summing the preceding over all sets of k distinct indices yields 

P(X — k) = J2 P(A il ...A ik )- J2 E P(A h ...A ik Aj) 

ii<...<4 i'i<--<4 7'^{ii.ijt) 

+ E E P{A il ...A il A h A h )~... (2.24) 

i'l<-<4 7i<72^{/i./jtl 


First, note that 

Now, consider 

E E PiA h ...A ik Aj) 

4<-<4 j<£[ii,-,i k } 


The probability of every intersection of k + 1 distinct events A mi . ..., A mk+l will 
appear (E 1 ) times in this multiple summation. This is so because each choice of k 
of its indices to play the role of ;'i, ..., 4 and the other to play the role of j results in 
the addition of the term P(A„ n ... A mk+l ). Hence, 


E E P{Ai x ... Ai k Aj) — ( ^ ^ El P(A„n ■ ■ ■ A mk+l ) 

/!<...<4 M'l, -,4} ' ' mi<...<m k+ i 



Similarly, because the probability of every intersection of k + 2 distinct events 

A m t,..., A, „ k+2 will appear (*+ 2 ) times in Ei 1 <...<i k T ljl <j 1 m,-M 

P(Aj l ... A( k Aj 1 Aj 2 ), it follows that 

El El P{Ai\ ■ ■ ■ Ai k Aji Aj 2 ) = ^ ^ ^ $k +2 

4<--<4 ji<j 2 <t[h,-,i k } 
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Repeating this argument for the rest of the multiple summations in (2.24) yields the 
result 

P(X =k) = S k -( k+ k l ^j S k+ 1 + + k 2 ) S k+2 ~ ■ ■ ■ + (-!)"“* (”) S„ 

The preceding can be written as 

p(x =*>= (~v k+j (i) s j 

Using this we will now prove that 

p(x>k) = Y2(-\) k+j ( J k z\)s J 

j=k v 7 

The proof uses a backwards mathematical induction that starts with k = n. Now, when 
k = n the preceding identity states that 

P(X = n) = S n 

which is true. So assume that 

P(X>k+ 1)= (-1)* +1+J ' ( j ~ Sj 

j=k +I ' 7 

But then 


P(X > k) = P(X =k) + P(X > k + 1) 

= E (~» k+J ([) s j + E (-v k+l+j ( ; * : ) s i 

j=k V 7 j=k +1 V 7 

1 (()-(';> 

;'=*+l \ / \ / 

= s k+ ± ( -i)W (t; |) Sj 

7=4+1 V 7 

7=4 v 7 

which completes the proof. 


2.8 Limit Theorems 


We start this section by proving a result known as Markov’s inequality. 
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Proposition 2.6 (Markov’s Inequality) If X is a random variable that takes only 
nonnegative values, then for any value a > 0 


P{X >a}< 


E[X] 


Proof. We give a proof for the case where X is continuous with density /. 

n OO 

E[X] = I xf(x)dx 

Jo 


= / xf{x)dx+ / xf(x)dx 
Jo Ja 


>-f 


xf (x) dx 

pOO 

> / af(x)dx 

Ja 

poo 

= a I f(x) dx 
Ja 

= aP{X > a } 


and the result is proven. 

As a corollary, we obtain the following. 

Proposition 2.7 (Chebyshev’s Inequality) If A is a random variable with mean // 
and variance a 2 , then, for any value k > 0, 

( 7 2 

P{\X~^\>k) < -y 


Proof. Since (X — /x) 2 is a nonnegative random variable, we can apply Markov’s 
inequality (with a = k 2 ) to obtain 


P{(X- /xf > k 2 } < 


E[(X - ix) 2 ] 
k 2 


But since (X — fx ) 2 > k- if and only if | X — /x\ > k, the preceding is equivalent to 


P{\X-n\>k] < 


E[{X-ix) 2 } g 2 


k 2 


k 2 


and the proof is complete. 

The importance of Markov’s and Chebyshev’s inequalities is that they enable us to 
derive bounds on probabilities when only the mean, or both the mean and the variance, 
of the probability distribution are known. Of course, if the actual distribution were 
known, then the desired probabilities could be exactly computed, and we would not 
need to resort to bounds. 
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Example 2.49 Suppose we know that the number of items produced in a factory 
during a week is a random variable with mean 500. 

(a) What can be said about the probability that this week’s production will be at least 
1000 ? 

(b) If the variance of a week’s production is known to equal 100, then what can be said 
about the probability that this week’s production will be between 400 and 600? 

Solution: Let X be the number of items that will be produced in a week. 

(a) By Markov’s inequality, 


P{X > 1000} < 


E[X] 

1000 


500 _ 1 
1000 ~~ 2 


(b) By Chebyshev’s inequality. 


Hence, 


P{|X-500| > 100} < - T 

11 ' “ “ ( 100) 2 


1 

Too 


P{|X- 500| < 100} > 1 - 


1 

Too 


99 

Too 


and so the probability that this week’s production will be between 400 and 600 is at 
least 0.99. ■ 

The following theorem, known as the strong law of large numbers, is probably the 
most well-known result in probability theory. It states that the average of a sequence 
of independent random variables having the same distribution will, with probability 1, 
converge to the mean of that distribution. 

Theorem 2.1 (Strong Law of Large Numbers) Let X \, Xj, ... be a sequence of 
independent random variables having a common distribution, and let E[Xi ] = fi. Then, 
with probability 1, 


Xi + Xi + • • • + X n 

-► /x 

n 


as n 


oo 


As an example of the preceding, suppose that a sequence of independent trials is 
performed. Let £ be a fixed event and denote by P(E) the probability that E occurs 
on any particular trial. Letting 


1, 


0, 


if E occurs on the i th trial 

if E does not occur on the i th trial 


we have by the strong law of large numbers that, with probability 1, 
X\ + ■ ■ ■ + X n 


n 


E[X] = P(E ) 


(2.25) 
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Since X \ +■■■ + X n represents the number of times that the event E occurs in the 
first n trials, we may interpret Equation (2.25) as stating that, with probability 1, the 
limiting proportion of time that the event E occurs is just P(E). 

Running neck and neck with the strong law of large numbers for the honor of 
being probability theory’s number one result is the central limit theorem. Besides 
its theoretical interest and importance, this theorem provides a simple method for 
computing approximate probabilities for sums of independent random variables. It 
also explains the remarkable fact that the empirical frequencies of so many natural 
“populations” exhibit a bell-shaped (that is, normal) curve. 

Theorem 2.2 (Central Limit Theorem) Let X\, X 2 , . .. be a sequence of indepen¬ 
dent, identically distributed random variables, each with mean /1 and variance a 2 . Then 
the distribution of 

X\ + X 2 + • • ■ + X n — tip, 
a ^fn 


tends to the standard normal as n —> 00 . That is, 
X\ + X 2 + • • • + X n — np 




-* 2 ' 2 dx 


as n 00 . 


Note that like the other results of this section, this theorem holds for any distribution 
of the Xj s; herein lies its power. 

If X is binomially distributed with parameters n and p, then X has the same distri¬ 
bution as the sum of n independent Bernoulli random variables, each with parameter p. 
(Recall that the Bernoulli random variable is just a binomial random variable whose 
parameter n equals 1.) Hence, the distribution of 

X - E[X] _ X- np 
VVar(A) °Jnp{ 1 — p) 

approaches the standard normal distribution as n approaches 00 . The normal approxi¬ 
mation will, in general, be quite good for values of n satisfying np( \ — p) >10. 

Example 2.50 (Normal Approximation to the Binomial) Let X be the number of 
times that a fair coin, flipped 40 times, lands heads. Find the probability that X — 20. 
Use the normal approximation and then compare it to the exact solution. 

Solution: Since the binomial is a discrete random variable, and the normal a con¬ 
tinuous random variable, it leads to a better approximation to write the desired 
probability as 


P{X = 20} = Z 3 } 19.5 < X < 20.5} 
= P 


19.5 - 20 A - 20 20.5 - 20 

< —< 


/To 


vTo 


yio 


= p 


A-20 

0.16 < ——- < 0.16 


VTo 

0(0.16) - 0(—0.16) 
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where $(x), the probability that the standard normal is less than x is given by 
$(.*) = —Lf e~ y2/2 dy 

V 271 j—oo 

By the symmetry of the standard normal distribution 

$(-0.16) = P{N( 0, 1) > 0.16} = 1 - $(0.16) 

where /V(0, 1) is a standard normal random variable. Hence, the desired probability 
is approximated by 

P{X = 20} « 2$(0.16) - 1 


Using Table 2.3, we obtain 

P{X = 20} « 0.1272 
The exact result is 


P{X = 20} = 


40 


which can be shown to equal 0.1268. ■ 

Example 2.51 Let Xj,i = 1,2,, 10 be independent random variables, each being 


uniformly distributed over (0, 1). Estimate P{^]° Xj > 7}. 

Solution: Since E[Xi] = A, Var(Z,) = A we have by the central limit theorem 


that 


to 




> i\ = p 


El 0 *.- 


» 1 - $( 2 . 2 ) 
= 0.0139 


10(h) 


Example 2.52 The lifetime of a special type of battery is a random variable with mean 
40 hours and standard deviation 20 hours. A battery is used until it fails, at which point 
it is replaced by a new one. Assuming a stockpile of 25 such batteries, the lifetimes of 
which are independent, approximate the probability that over 1100 hours of use can be 
obtained. 


Solution: If we let X, denote the lifetime of the /th battery to be put in use, then 
we desire p = P{X\ + • • ■ + A 25 > 1100}, which is approximated as follows: 


P = 


X\ + ■ ■ ■ + X 25 - 1000 
20V25 

P{N(0, 1) > 1} 

1 -$( 1 ) 

0.1587 


> 


1100 - 10001 
20V25 I 
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Table 2.3 Area <f>(x) under the Standard Normal Curve to the Left of x 


X 

0.00 

0.01 

0.02 

0.03 

0.04 

0.05 

0.06 

0.07 

0.08 

0.09 

0.0 

0.5000 

0.5040 

0.5080 

0.5120 

0.5160 

0.5199 

0.5239 

0.5279 

0.5319 

0.5359 

0.1 

0.5398 

0.5438 

0.5478 

0.5517 

0.5557 

0.5597 

0.5636 

0.5675 

0.5714 

0.5753 

0.2 

0.5793 

0.5832 

0.5871 

0.5910 

0.5948 

0.5987 

0.6026 

0.6064 

0.6103 

0.6141 

0.3 

0.6179 

0.6217 

0.6255 

0.6293 

0.6331 

0.6368 

0.6406 

0.6443 

0.6480 

0.6517 

0.4 

0.6554 

0.6591 

0.6628 

0.6664 

0.6700 

0.6736 

0.6772 

0.6808 

0.6844 

0.6879 

0.5 

0.6915 

0.6950 

0.6985 

0.7019 

0.7054 

0.7088 

0.7123 

0.7157 

0.7190 

0.7224 

0.6 

0.7257 

0.7291 

0.7324 

0.7357 

0.7389 

0.7422 

0.7454 

0.7486 

0.7517 

0.7549 

0.7 

0.7580 

0.7611 

0.7642 

0.7673 

0.7704 

0.7734 

0.7764 

0.7794 

0.7823 

0.7852 

0.8 

0.7881 

0.7910 

0.7939 

0.7967 

0.7995 

0.8023 

0.8051 

0.8078 

0.8106 

0.8133 

0.9 

0.8159 

0.8186 

0.8212 

0.8238 

0.8264 

0.8289 

0.8315 

0.8340 

0.8365 

0.8389 

1.0 

0.8413 

0.8438 

0.8461 

0.8485 

0.8508 

0.8531 

0.8554 

0.8557 

0.8599 

0.8621 

1.1 

0.8643 

0.8665 

0.8686 

0.8708 

0.8729 

0.8749 

0.8770 

0.8790 

0.8810 

0.8830 

1.2 

0.8849 

0.8869 

0.8888 

0.8907 

0.8925 

0.8944 

0.8962 

0.8980 

0.8997 

0.9015 

1.3 

0.9032 

0.9049 

0.9066 

0.9082 

0.9099 

0.9115 

0.9131 

0.9147 

0.9162 

0.9177 

1.4 

0.9192 

0.9207 

0.9222 

0.9236 

0.9251 

0.9265 

0.9279 

0.9292 

0.9306 

0.9319 

1.5 

0.9332 

0.9345 

0.9357 

0.9370 

0.9382 

0.9394 

0.9406 

0.9418 

0.9429 

0.9441 

1.6 

0.9452 

0.9463 

0.9474 

0.9484 

0.9495 

0.9505 

0.9515 

0.9525 

0.9535 

0.9545 

1.7 

0.9554 

0.9564 

0.9573 

0.9582 

0.9591 

0.9599 

0.9608 

0.9616 

0.9625 

0.9633 

1.8 

0.9641 

0.9649 

0.9656 

0.9664 

0.9671 

0.9678 

0.9686 

0.9693 

0.9699 

0.9706 

1.9 

0.9713 

0.9719 

0.9726 

0.9732 

0.9738 

0.9744 

0.9750 

0.9756 

0.9761 

0.9767 

2.0 

0.9772 

0.9778 

0.9783 

0.9788 

0.9793 

0.9798 

0.9803 

0.9808 

0.9812 

0.9817 

2.1 

0.9821 

0.9826 

0.9830 

0.9834 

0.9838 

0.9842 

0.9846 

0.9850 

0.9854 

0.9857 

2.2 

0.9861 

0.9864 

0.9868 

0.9871 

0.9875 

0.9878 

0.9881 

0.9884 

0.9887 

0.9890 

2.3 

0.9893 

0.9896 

0.9898 

0.9901 

0.9904 

0.9906 

0.9909 

0.9911 

0.9913 

0.9916 

2.4 

0.9918 

0.9920 

0.9922 

0.9925 

0.9927 

0.9929 

0.9931 

0.9932 

0.9934 

0.9936 

2.5 

0.9938 

0.9940 

0.9941 

0.9943 

0.9945 

0.9946 

0.9948 

0.9949 

0.9951 

0.9952 

2.6 

0.9953 

0.9955 

0.9956 

0.9957 

0.9959 

0.9960 

0.9961 

0.9962 

0.9963 

0.9964 

2.7 

0.9965 

0.9966 

0.9967 

0.9968 

0.9969 

0.9970 

0.9971 

0.9972 

0.9973 

0.9974 

2.8 

0.9974 

0.9975 

0.9976 

0.9977 

0.9977 

0.9978 

0.9979 

0.9979 

0.9980 

0.9981 

2.9 

0.9981 

0.9982 

0.9982 

0.9983 

0.9984 

0.9984 

0.9985 

0.9985 

0.9986 

0.9986 

3.0 

0.9987 

0.9987 

0.9987 

0.9988 

0.9988 

0.9989 

0.9989 

0.9989 

0.9990 

0.9990 

3.1 

0.9990 

0.9991 

0.9991 

0.9991 

0.9992 

0.9992 

0.9992 

0.9992 

0.9993 

0.9993 

3.2 

0.9993 

0.9993 

0.9994 

0.9994 

0.9994 

0.9994 

0.9994 

0.9995 

0.9995 

0.9995 

3.3 

0.9995 

0.9995 

0.9995 

0.9996 

0.9996 

0.9996 

0.9996 

0.9996 

0.9996 

0.9997 

3.4 

0.9997 

0.9997 

0.9997 

0.9997 

0.9997 

0.9997 

0.9997 

0.9997 

0.9997 

0.9998 


We now present a heuristic proof of the central limit theorem. Suppose first that the 
Xj have mean 0 and variance 1, and let E[e ,x \ denote their common moment generating 


function. Then, the moment generating function of 


jVj— 

•Jn 


IS 
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exp 


Xi 


X n 


— E[ e tXl/Vn e tX2/Vn . . . 

= {E[e tX/ ' /Tl ]) n by independence 


Now, for n large, we obtain from the Taylor series expansion of e y that 
t X t 2 X 2 

y/n 2 n 


jX/Ji , 


1 


Taking expectations shows that when n is large 
tE[X] t 2 E[X 2 ] 


E[e 


tX/yfR 


] « 1 


2 n 


- 1 + 2 ^ 


because E[X] = 0, E[X~] — 1 
Therefore, we obtain that when n is large 
X i + • • ■ + X n 


exp 


-Jn 


t 2\ n 


1 + 2 n 


When n goes to oo the approximation can be shown to become exact and we have 
X\ + ■ ■ • + X n 


lim E 

n—> oo 


exp 


~/n 


= e’ 1 ' 2 


Thus, the moment generating function of Xl+ ^~ X " converges to the moment generat¬ 
ing function of a (standard) normal random variable with mean 0 and variance 1. Using 

X\-\ — \~x„ 


this, it can be proven that the distribution function of the random variable 




converges to the standard normal distribution function <I>. 

When the X, have mean /i and variance a , the random variables ' a have mean 
0 and variance 1. Thus, the preceding shows that 


X i — /u. + X 2 — ■ 


X n I 1 


a y/n 

which proves the central limit theorem. 


< a 


$(fl) 


2.9 Stochastic Processes 

A stochastic process {X(t), t <= T\ is a collection of random variables. That is, for 
each t e T, X(t) is a random variable. The index t is often interpreted as time and, as 
a result, we refer to X(t) as the state of the process at time t. For example, X (t) might 
equal the total number of customers that have entered a supermarket by time t ; or the 
number of customers in the supermarket at time t ; or the total amount of sales that have 
been recorded in the market by time t; etc. 
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Figure 2.3 Particle moving around a circle. 


The set T is called the index set of the process. When 7’ is a countable set the 
stochastic process is said to be a discrete-time process. If T is an interval of the real line, 
the stochastic process is said to be a continuous-time process. For instance, {X n , n = 
0 , 1 , ...} is a discrete-time stochastic process indexed bythe nonnegative integers; while 
\X{t), t > 0} is a continuous-time stochastic process indexed by the nonnegative real 
numbers. 

The state space of a stochastic process is defined as the set of all possible values 
that the random variables X (t ) can assume. 

Thus, a stochastic process is a family of random variables that describes the evolution 
through time of some (physical) process. We shall see much of stochastic processes in 
the following chapters of this text. 

Example 2.53 Consider a particle that moves along a set of m + 1 nodes, labeled 
0,1 ,,m, that are arranged around a circle (see Figure 2.3). At each step the particle 
is equally likely to move one position in either the clockwise or counterclockwise 
direction. That is, if X„ is the position of the particle after its nth step then 

P{X n+ 1 = i + l\X n = i} = P{X n+x = i- l\X n = i) = i 

where i + 1 =0 when i — m, and i — 1 = m when i = 0. Suppose now that the particle 
starts at 0 and continues to move around according to the preceding rules until all the 
nodes 1,2 ,,m have been visited. What is the probability that node i, i = 1 , ,m, 

is the last one visited? 

Solution: Surprisingly enough, the probability that node i is the last node visited 
can be determined without any computations. To do so, consider the first time that 
the particle is at one of the two neighbors of node i, that is, the first time that the 
particle is at one of the nodes i — 1 or i + 1 (with m + 1 = 0). Suppose it is at 
node i — 1 (the argument in the alternative situation is identical). Since neither node 
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i nor i + 1 has yet been visited, it follows that i will be the last node visited if and 
only if i + 1 is visited before i. This is so because in order to visit i + 1 before i 
the particle will have to visit all the nodes on the counterclockwise path from i — 1 
to i + 1 before it visits i. But the probability that a particle at node i — 1 will visit 
i + 1 before i is just the probability that a particle will progress m — 1 steps in a 
specified direction before progressing one step in the other direction. That is, it is 
equal to the probability that a gambler who starts with one unit, and wins one when 
a fair coin turns up heads and loses one when it turns up tails, will have his fortune 
go up by m — 1 before he goes broke. Hence, because the preceding implies that the 
probability that node i is the last node visited is the same for all i, and because these 
probabilities must sum to 1 , we obtain 

P{i is the last node visited} = 1 /m, i = 1, ... ,rn ■ 

Remark The argument used in Example 2.53 also shows that a gambler who is 
equally likely to either win or lose one unit on each gamble will be down n before 
being up 1 with probability l/(n + 1 ); or equivalently, 

n 

P{ gambler is up 1 before being down n } = - 

n + 1 

Suppose now we want the probability that the gambler is up 2 before being down n. 
Upon conditioning on whether he reaches up 1 before down n, we obtain that 

P{ gambler is up 2 before being down n] 

n 

— P{up 2 before down «|up 1 before down n }- 

n + 1 
n 

— P{up 1 before down n + 1 }- 

n + 1 

n + 1 n n 

n+2n+l n+2 

Repeating this argument yields that 

n 

P{ gambler is up k before being down n } = - 

n + k 


Exercises 

1. An urn contains five red, three orange, and two blue balls. Two balls are randomly 
selected. What is the sample space of this experiment? Let X represent the number 
of orange balls selected. What are the possible values of XI Calculate P\ X — 0}. 

2. Let X represent the difference between the number of heads and the number of 
tails obtained when a coin is tossed n times. What are the possible values of XI 

3. In Exercise 2, if the coin is assumed fair, then, for n = 2, what are the probabilities 
associated with the values that X can take on? 
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*4. Suppose a die is rolled twice. What are the possible values that the following 
random variables can take on? 


(a) The maximum value to appear in the two rolls. 

(b) The minimum value to appear in the two rolls. 

(c) The sum of the two rolls. 

(d) The value of the first roll minus the value of the second roll. 

5. If the die in Exercise 4 is assumed fair, calculate the probabilities associated with 
the random variables in (i)-(iv). 

6 . Suppose five fair coins are tossed. Let E be the event that all coins land heads. 
Define the random variable Ip 


Ie 


1 , if £ occurs 
0, if E c occurs 


For what outcomes in the original sample space does Ie equal 1? What is 

P{Ie= 1}? 

7. Suppose a coin having probability 0.7 of coming up heads is tossed three times. 
Let X denote the number of heads that appear in the three tosses. Determine the 
probability mass function of X. 

8 . Suppose the distribution function of X is given by 


F(b) = 


l 

2 ’ 

1 , 


b < 0 
0 < b < 1 
1 < b < oo 


What is the probability mass function of XI 

9. If the distribution function of F is given by 


F{b) = 


0 , 

b 

A 

o 

1 

2’ 

0 

< b < 

3 

1 

< b < 

5 ’ 



4 

2 

< b < 

5 ’ 



9 

10’ 

3 

< b < 

1 , 

b 

> 3.5 


calculate the probability mass function of X. 

10. Suppose three fair dice are rolled. What is the probability at most one six appears? 

*11. A ball is drawn from an urn containing three white and three black balls. After 
the ball is drawn, it is then replaced and another ball is drawn. This goes on 
indefinitely. What is the probability that of the first four balls drawn, exactly two 
are white? 

12. On a multiple-choice exam with three possible answers for each of the five ques¬ 
tions, what is the probability that a student would get four or more correct answers 
just by guessing? 
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13. An individual claims to have extrasensory perception (ESP). As a test, a fair 
coin is flipped ten times, and he is asked to predict in advance the outcome. Our 
individual gets seven out of ten correct. What is the probability he would have 
done at least this well if he had no ESP? (Explain why the relevant probability is 
P{X > 7} and not P{X = 7}.) 

14. Suppose X has a binomial distribution with parameters 6 and j. Show that X — 3 
is the most likely outcome. 

15. Let X be binomially distributed with parameters n and p. Show that as k goes 
from 0 to n, P(X = k) increases monotonically, then decreases monotonically 
reaching its largest value 

(a) in the case that (n + \ )p is an integer, when k equals either (n + \)p — 1 or 
(n + 1 )p, 

(b) in the case that (n + 1 )p is not an integer, when k satisfies (n + \)p — 1 < 
k < (n + l)p. 

Hint: Consider P{X = k}/P{X = k — 1} and see for what values of k it is 
greater or less than 1. 

*16. An airline knows that 5 percent of the people making reservations on a certain 
flight will not show up. Consequently, their policy is to sell 52 tickets for a flight 
that can hold only 50 passengers. What is the probability that there will be a seat 
available for every passenger who shows up? 

17. Suppose that an experiment can result in one of r possible outcomes, the /th 
outcome having probability pt, i = I..... r, p, = 1. If n of these experi¬ 
ments are performed, and if the outcome of any one of the n does not affect the 
outcome of the other n — 1 experiments, then show that the probability that the 
first outcome appears x\ times, the second X 2 times, and the rth x r times is 

fl\ XX 

—,—r-r P'Y P? * * * Pr r when x\ + X 2 H-h x r = n 

x\\ X 2 '... x r \ 

This is known as the multinomial distribution. 

18. InExercise 1 7, let A, denote the number of times that the i th type outcome occurs, 
i = 1 ,..., r. 

(a) For 0 < j < n, use the definition of conditional probability to find P(Xj = 
X{, i = 1 ,..., r - 11 A r = j). 

(b) What can you conclude about the conditional distribution of X\, ..., X r -i 
given that X r = j ? 

(c) Give an intuitive explanation for your answer to part (b). 

19. In Exercise 17, let Xj denote the number of times the f th outcome appears, i = 
1, ..., r. What is the probability mass function of X 1 + X 2 + ■ ■ ■ + Xp. 

20. A television store owner figures that 50 percent of the customers entering his 
store will purchase an low end television set, 20 percent will purchase a high end 
television set, and 30 percent will just be browsing. If five customers enter his 
store on a certain day, what is the probability that two customers purchase high 
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end sets, one customer purchases an low end set, and two customers purchase 
nothing? 

21. In Exercise 20, what is the probability that our store owner sells three or more 
televisions on that day? 

22. If a fair coin is successively flipped, find the probability that a head first appears 
on the fifth trial. 

*23. A coin having probability p of coming up heads is successively flipped until the 
rth head appears. Argue that X, the number of flips required, will be n. n > r, 
with probability 

P{X = n) = “ j) //(1 - p) n ~\ n > r 

This is known as the negative binomial distribution. 

Hint: How many successes must there be in the first « — 1 trials? 

24. The probability mass function of X is given by 

/ r + k — 1 \ t 

P(k)=l ^ _ ! \p(\~P) k , k = 0,1,... 

Give a possible interpretation of the random variable X. 

Hint: See Exercise 23. 

In Exercises 25 and 26, suppose that two teams are playing a series of games, 
each of which is independently won by team A with probability p and by team B 
with probability 1 — p. The winner of the series is the first team to win i games. 

25. If i — 4, find the probability that a total of 7 games are played. Also show that 
this probability is maximized when p — 1 /2. 

26. Find the expected number of games that are played when 

(a) i = 2; 

(b) i = 3. 

In both cases, show that this number is maximized when p = 1 /2. 

*27. A fair coin is independently flipped n times, k times by A and n — k times by B. 
Show that the probability that A and B flip the same number of heads is equal to 
the probability that there are a total of k heads. 

28. Suppose that we want to generate a random variable X that is equally likely 
to be either 0 or 1, and that all we have at our disposal is a biased coin that, 
when flipped, lands on heads with some (unknown) probability p. Consider the 
following procedure: 

1. Flip the coin, and let 0i, either heads or tails, be the result. 

2. Flip the coin again, and let Cb be the result. 

3. If 0i and O2 are the same, return to step 1. 

4. If O2 is heads, set X = 0, otherwise set X = 1. 

(a) Show that the random variable X generated by this procedure is equally 
likely to be either 0 or 1. 



Random Variables 


83 


(b) Could we use a simpler procedure that continues to flip the coin until the 
last two flips are different, and then sets X — 0 if the final flip is a head, 
and sets X = 1 if it is a tail? 

29. Consider n independent flips of a coin having probability p of landing heads. 
Say a changeover occurs whenever an outcome differs from the one preceding 
it. For instance, if the results of the flips are H H T H T H H T , then there 
are a total of five changeovers. If p = 1 /2, what is the probability there are k 
changeovers? 

30. Let X be a Poisson random variable with parameter X. Show that P{X = /} 
increases monotonically and then decreases monotonically as i increases, reach¬ 
ing its maximum when i is the largest integer not exceeding X. 

Hint: Consider P{X — i}/P{X = i — 1}. 

31. Compare the Poisson approximation with the correct binomial probability for 
the following cases: 

(a) P{X = 2} when n — 8, p — 0.1. 

(b) P{X = 9} when n — 10, p = 0.95. 

(c) P{X = 0} when n = 10, p = 0.1. 

(d) P{X = 4} when n = 9, p — 0.2. 

32. If you buy a lottery ticket in 50 lotteries, in each of which your chance of winning 
a prize is y^, what is the (approximate) probability that you will win a prize (a) 
at least once, (b) exactly once, (c) at least twice? 

33. Let X be a random variable with probability density 

fc(l - x 2 ), — 1 < x < 1 
X 0, otherwise 


(a) What is the value of c? 

(b) What is the cumulative distribution function of XI 

34. Let the probability density of X be given by 


fix) = 


c(4x — 2x 2 ), 

0 , 


0 < x < 2 

otherwise 


(a) What is the value of c? 

(b) p{^<*<!}=? 

35. The density of X is given by 

,, . 10/x 2 , for x > 10 

/(X)= |0 , for x < 10 

What is the distribution of XI Find P{X > 20}. 

36. A point is uniformly distributed within the disk of radius 1. That is, its density is 

0 < x 2 + y 2 < 1 


fix, y) = C, 
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Find the probability that its distance from the origin is less than x, 0 < x < 1. 
37. Let X\, X 2 , ■. ■, X n be independent random variables, each having a uniform 
distribution over (0,1). Let M = maximum (X\, X 2 ,. ■ ■, X n ). Show that the 
distribution function of M, Fm (■), is given by 

Fm{x) = x n , 0 < x < 1 


What is the probability density function of Ml 
*38. If the density function of X equals 


/(*) = 



0 < x < 00 
x < 0 


find c. What is P{X > 2}? 

39. The random variable X has the following probability mass function: 


P( D = 2> P(2)=l P(24)=l 

Calculate E[X], 

40. Suppose that two teams are playing a series of games, each of which is indepen¬ 
dently won by team A with probability p and by team B with probability 1 — p. 
The winner of the series is the first team to win four games. Find the expected 
number of games that are played, and evaluate this quantity when p = 1/2. 

41. Consider the case of arbitrary p in Exercise 29. Compute the expected number 
of changeovers. 

42. Suppose that each coupon obtained is, independent of what has been previously 
obtained, equally likely to be any of m different types. Find the expected number 
of coupons one needs to obtain in order to have at least one of each type. 

Hint: Let X be the number needed. It is useful to represent X by 

m 

*=x> 

1=1 


where each X, is a geometric random variable. 

43. An urn contains n + m balls, of which n are red and m are black. They are 
withdrawn from the urn, one at a time and without replacement. Let X be the 
number of red balls removed before the first black ball is chosen. We are interested 
in determining E[X], To obtain this quantity, number the red balls from 1 to n. 
Now define the random variables X,,i = 1,..., n, by 


X, = 


1 , 

0 , 


if red ball i is taken before any black ball is chosen 
otherwise 


(a) Express X in terms of the Xj. 

(b) Find E[X]. 
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44. In Exercise 43, let Y denote the number of red balls chosen after the first but 
before the second black ball has been chosen. 

(a) Express Y as the sum of n random variables, each of which is equal to either 
0 or 1. 

(b) Find E[Y}. 

(c) Compare E[Y] to E[X] obtained in Exercise 43. 

(d) Can you explain the result obtained in part (c)? 

45. A total of r keys are to be put, one at a time, in k boxes, with each key indepen¬ 
dently being put in box i with probability pi, Y^=i Pi = 1- Each time a key is 
put in a nonempty box, we say that a collision occurs. Find the expected number 
of collisions. 

46. If X is a nonnegative integer valued random variable, show that 

OO 00 

E[X] = £>{*> 1 .} = £>{*>„} 

n =1 n=0 

Hint: Define the sequence of random variables /„ , n > 1, by 

_ Jl, if n < X 
In ~ [0, if n> X 

Now express X in terms of the /„. 

(b) If X and Y are both nonnegative integer valued random variables, show that 

OO OO 

E[XY] = P(X - Y - 

n= 1 m =1 

*47. Consider three trials, each of which is either a success or not. Let X denote the 
number of successes. Suppose that E[X] = 1.8. 

(a) What is the largest possible value of P{X = 3}? 

(b) What is the smallest possible value of P{X = 3}? 

In both cases, construct a probability scenario that results in P{X = 3} having 
the desired value. 

*48. If X is a nonnegative random variable, and g is a differential function with 
g(0) = 0, then 

POO 

E[g(X)} = / P(X > t)g\t)dt 
Jo 

Prove the preceding when X is a continuous random variable. 

*49. Prove that E[X 2 ] > (E[X]) 2 . When do we have equality? 

50. Let c be a constant. Show that 


(a) Var(cA) = c 2 Var(X); 

(b) Var(c + X) = Var(X). 
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51. A coin, having probability p of landing heads, is flipped until a head appears for 
the rth time. Let N denote the number of flips required. Calculate E[N], 

Hint: There is an easy way of doing this. It involves writing N as the sum of 
r geometric random variables. 

52. (a) Calculate E[X] for the maximum random variable of Exercise 37. 

(b) Calculate £[X] for X as in Exercise 33. 

(c) Calculate E[X] for X as in Exercise 34. 

53. If X is uniform over (0, 1), calculate E[X n ] and Varf A' 1 ). 

54. Let X and Y each take on either the value 1 or — 1. Let 

p(l, D = P{X = 1, T = 1}, 
p{\,-\) = P{X=\,Y = -\}, 
p(-l,l) = P{X = -l,Y=l}, 
p(- 1, -1) = P{X — —1, Y — -1} 

Suppose that E[X] = E[Y] = 0. Show that 

(a) p(\, 1) = p{—\, —1 ); 

(b) p(l,-l) = p(-l,l). 

Let p = 2p(l, 1). Find 

(c) Var(A); 

(d) Var(F); 

(e) Cov(A, Y). 

55. Suppose that the joint probability mass function of X and Y is 

P(X = i,Y = j) = ( J . ) e- 2l )J/j\, 0 < i < j 

(a) Find the probability mass function of Y. 

(b) Find the probability mass function of X. 

(c) Find the probability mass function of Y — X. 

56. There are n types of coupons. Each newly obtained coupon is, independently, type 
i with probability pi,i = 1Find the expected number and the variance 
of the number of distinct types obtained in a collection of k coupons. 

57. Suppose that X and Y are independent binomial random variables with parame¬ 
ters (n, p) and (m , p). Argue probabilistically (no computations necessary) that 
X + Y is binomial with parameters (n + m, p). 

58. An urn contains 2 n balls, of which r are red. The balls are randomly removed in 
n successive pairs. Let X denote the number of pairs in which both balls are red. 

(a) Find E[X], 

(b) Find Var(X). 



Random Variables 


87 


59 . Let Xi, X2, X3, and X4 be independent continuous random variables with a 
common distribution function F and let 


P = P{X 1 < *2 > *3 < * 4 } 

(a) Argue that the value of p is the same for all continuous distribution functions 
F. 

(b) Find p by integrating the joint density function over the appropriate region. 

(c) Find p by using the fact that all 4! possible orderings of Xi,..., X 4 are 
equally likely. 

60. Let X and Y be independent random variables with means /x t and p y and vari¬ 
ances er 2 and cr 2 . Show that 

Var(XF) = cr 2 <7 2 + /x 2 cr 2 + /x 2 er 2 

61. Let Xi, X 2 ,... be a sequence of independent identically distributed continuous 
random variables. We say that a record occurs at time n if X n > max (X], ..., 
X n -\ ). That is, X„ is a record if it is larger than each of X\,..., X n -\ . Show 

(a) P{a record occurs at time n) = 1 jn ; 

(b) /([number of records by time n] — ^" =1 1 /i; 

(c) Var(number of records by time n) = (' ~ l)//' 2 ; 

(d) Let N = min{n: n > 1 and a record occurs at time «}. Show /i[/V] = 00 . 

Hint: For (ii) and (iii) represent the number of records as the sum of indicator 
(that is, Bernoulli) random variables. 

62. Let ai < 112 < ■ ■ ■ < a n denote a set of n numbers, and consider any permutation 
of these numbers. We say that there is an inversion of a, and aj in the permutation 
if i < j and aj precedes a,-. For instance the permutation 4, 2, 1, 5, 3 has 5 
inversions—(4,2), (4,1), (4,3), (2,1), (5,3). Consider now a random permutation 
of a\, a 2 , ■ ■ ■, a n —in the sense that each of the n \ permutations is equally likely 
to be chosen—and let N denote the number of inversions in this permutation. 
Also, let 

Ni = number of k: k < i, aj precedes a * in the permutation 
and note that N = X7=t ■ 

(a) Show that N \, ..., N n are independent random variables. 

(b) What is the distribution of Nj ? 

(c) Compute E[N] and Var(IV). 

63. Let X denote the number of white balls selected when k balls are chosen at 
random from an urn containing n white and m black balls. 

(a) Compute P{X = i}. 

(b) Let, for i = 1,2,... ,k; j — 1,2,... ,n. 
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Xi = 


Yj = 


1 , 

0 , 


if white ball j is selected 
otherwise 


Compute E[X] in two ways by expressing X first as a function of the XjS and 
then of the Yi s. 

*64. Show that Var(X) = 1 when X is the number of men who select their own hats 
in Example 2.31. 

65. The number of traffic accidents on successive days are independent Poisson 
random variables with mean 2. 


(a) Find the probability that 3 of the next 5 days have two accidents. 

(b) Find the probability that there are a total of six accidents over the next 2 days. 

(c) If each accident is independently a “major accident” with probability p. what 
is the probability there are no major accidents tomorrow? 

66. Show that the random variables X\, ..., X n are independent if for each i = 
2,..., ;i, Xi is independent of X[, ..., X,- 

Hint: X [, ... , X n are independent if for any sets ,41, A,, 

n 

P(Xj eA i ,j = l,...,«) = f] P(Xj e Aj) 

j =i 

On the other hand Xj is independent of X \, ..., X,-i if for any sets A\, .... A, 
P(Xi e Aj\Xj e A jt j = 1,..., i — 1) = P(X t e A,-) 

67. Calculate the moment generating function of the uniform distribution on (0, 1). 
Obtain E[X] and Var[X] by differentiating. 

68. Let X and W be the working and subsequent repair times of a certain machine. 
Let Y = X + W and suppose that the joint probability density of X and Y is 

fx, y (x, y) = X 2 e~ Xy , 0 < x < y < oo 

(a) Find the density of X. 

(b) Find the density of Y. 

(c) Find the joint density of X and W. 

(d) Find the density of W. 

69. In deciding upon the appropriate premium to charge, insurance companies some¬ 
times use the exponential principle, defined as follows. With X as the random 
amount that it will have to pay in claims, the premium charged by the insurance 
company is 

P = - In (E[e aX ]) 
a 

where a is some specified positive constant. Find P when X is an exponential 
random variable with parameter X, and a = aX, where 0 < a < 1. 
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70. Calculate the moment generating function of a geometric random variable. 

*71. Show that the sum of independent identically distributed exponential random 
variables has a gamma distribution. 

72. Successive monthly sales are independent normal random variables with mean 
100 and variance 100 . 

(a) Find the probability that at least one of the next 5 months has sales above 
115. 

(b) Find the probability that the total number of sales over the next 5 months 
exceeds 530. 

73. Consider n people and suppose that each of them has a birthday that is equally 
likely to be any of the 365 days of the year. Furthermore, assume that their 
birthdays are independent, and let A be the event that no two of them share the 

same birthday. Define a “trial” for each of the (") pairs of people and say that 

trial (i, j ), i ^ j, is a success if persons i and j have the same birthday. Let S,j 
be the event that trial ( i , j) is a success. 

(a) Find P(Stj), i / j. 

(b) Are Sjj and Sk, r independent when ;, j , k, r are all distinct? 

(c) Are Sjj and Sk.j independent when i, j, k are all distinct? 

(d) Are Si 2 , S 13 , ^^independent? 

(e) Employ the Poisson paradigm to approximate P(A). 

(f) Show that this approximation yields that P(A ) ~ .5 when n — 23. 

(g) Let B be the event that no three people have the same birthday. Approximate 
the value of n that makes P(B) ~ .5. (Whereas a simple combinatorial argu¬ 
ment explicitly determines P(A), the exact determination of P(B) is very 
complicated.) 

Hint: Define a trial for each triplet of people. 

74. If X is Poisson with parameter X, show that its Laplace transform is given by 

g(u) = E[e~ uX ] = e^ e ~ U ~ l) 


75. Consider Example 2.48. Find Cov (Xj, Xj) in terms of the a rs . 

76. Use Chebyshev’s inequality to prove the weak law of large numbers. Namely, if 
X 1 , X 2 ,... are independent and identically distributed with mean /x and variance 
ex 2 then, for any e > 0, 


Xj + X 2 


X n 


- M 


> £ > 0 


77. Suppose that A is a random variable with mean 10 and variance 15. What can 
we say about P{5 < X < 15}? 

78. Let X\, X 2 ,..., Am be independent Poisson random variables with mean 1. 

(a) Use the Markov inequality to get a bound on P{X\ + ■ ■ ■ + X 10 > 15}. 

(b) Use the central limit theorem to approximate P{X\ + • ■ • + X 10 > 15}. 
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79. If X is normally distributed with mean 1 and variance 4, use the tables to find 
P{2 < X < 3}. 

*80. Show that 


lim 

«-*oo 


n 


k=0 


k\ 


1 

2 


Hint: Let X n be Poisson with mean n. Use the central limit theorem to show 
that P{X n < n] —>■ j. 

81. Let X and Y be independent normal random variables, each having parameters 
ji and <t 2 . Show that X + Y is independent of X — Y. 

Hint: Find their joint moment generating function. 

82. Let f„) denote the joint moment generating function of X\. .... X n . 

(a) Explain how the moment generating function of Xj , (j>x l (f,), can be obtained 
from ..., t n ). 

(b) Show that Xi,... ,X n are independent if and only if 

4>(h, ...,t n ) = 0 A1 (fi) ■ • • (t>x„(.t n ) 

83. With K(t) = log (E [V x ]), show that 

K'(0) = E[X], K"( 0) = Var(Z) 


84. Let X denote the number of the events A\ . A n , that occur. Express E[X], 

Var(X),and 

k — 1 ,..., n. 

85. The standard deviation of a random variable is the positive square root of its vari¬ 
ance. Letting ctx and ay denote the standard deviations of the random variables 
X and Y, we define the correlation of X and Y by 


in terms of the quantities Sk = < <i - ( . P(A, l ... A, k ), 


Corr(X, Y) = 


Cov(Z, Y) 
axay 


(a) Starting with the inequality Var ( + ) L 0,showthat —1 < Con(X, Y ). 

(b) Prove the inequality 


-1 < Corr(X, Y) < 1 


(c) If ox+y is the standard deviation of X + Y, show that 


Ox+Y < Ox + ay 

86 . Each new book donated to a library must be processed. Suppose that the time it 
takes a librarian to process a book has mean 10 minutes and standard deviation 
3 minutes. If a librarian has 40 books that must be processed one at a time. 
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(a) approximate the probability that it will take more than 420 minutes to process 
all these books; 

(b) approximate the probability that at least 25 books will be processed in the 
first 240 minutes. 

87. Recall that X is said to be a gamma random variable with parameters (a, X) if 
its density is 

fix) = Xe~ >x (ax)“ - 1 / r(a), x>0 

(a) If Z is a standard normal random variable, show that Z 2 is a gamma random 
variable with parameters (1/2. 1/2). 

(b) If Zj,..., Z„ are independent standard normal random variables, then 
Y^!!= i Z 2 is said to be a chi-squared random variable with n degrees of free¬ 
dom. Explain how you can use results from Example 2.39 to show that the 
density function of Yll—i Z 2 is 


e~ x l 2 x n l 2-1 

fix) ~ 2 "/ 2 r(n/2) ’ 


x > 0 
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3.1 Introduction 

One of the most useful concepts in probability theory is that of conditional probability 
and conditional expectation. The reason is twofold. First, in practice, we are often 
interested in calculating probabilities and expectations when some partial information 
is available; hence, the desired probabilities and expectations are conditional ones. 
Secondly, in calculating a desired probability or expectation it is often extremely useful 
to first “condition” on some appropriate random variable. 


3.2 The Discrete Case 


Recall that for any two events E and F, the conditional probability of E given F is 
defined, as long as P(F) > 0, by 


P(E\F) = 


P(EF) 

P(F) 


Hence, if X and Y are discrete random variables, then it is natural to define the 
conditional probability mass function of X given that Y = y, by 


Px\y(x\y) = P{X = x\Y = y} 
_ P{X = x,Y = >-} 
P{Y - v} 

_ p(x, y) 

Pv(y) 


Introduction to Probability Models, Eleventh Edition. http://dx.doi.org/10.1016/B978-0-12-407948-9.00003-7 
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for all values of y such that P{Y = y} > 0. Similarly, the conditional probability 
distribution function of X given that Y — v is defined, for all y such that / J { Y = y] > 0, 
by 


^x|y(x|y) = P{X^x|F = y} 

= X! Px\Y(a\y) 

a^.x 

Finally, the conditional expectation of X given that Y = y is defined by 
E[X\Y = y]= xP{X = x\Y = y} 

X 

= X! x Px\Y(x\y) 

X 

In other words, the definitions are exactly as before with the exception that everything 
is now conditional on the event that Y = y . If X is independent of Y, then the conditional 
mass function, distribution, and expectation are the same as the unconditional ones. 
This follows, since if X is independent of Y , then 

Px\y{x\y) = P{X = x\Y = y} 

= P{X=x) 

Example 3.1 Suppose that p(x, y), the joint probability mass function of X and Y, 
is given by 

p(l, 1) = 0.5, p( 1,2) = 0.1, p{ 2,1) = 0.1, p{ 2, 2) = 0.3 
Calculate the conditional probability mass function of X given that Y = 1 . 

Solution: We first note that 


Py( 1) = P( x - 1} = Pd. D + P(2. 1) = 0.6 

X 

Hence, 


Px|y(l|l) = P{X = 1|T = 1} 
_ P{X = 1,7=1} 
P{Y= 1} 

= Pi 1, 1) 

Py( 1) 

_ 5 

“ 6 

Similarly, 


px|y(2|l) = 


P( 2, 1) 
Py( 1) 


1 

6 
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Example 3.2 If X \ and X 2 are independent binomial random variables with respective 
parameters (n\, p) and (n 2 , p), calculate the conditional probability mass function of 
Xi given that X\ + X 2 — m. 

Solution: With q = 1 — p. 


P{Xi = k\Xi+X 2 = m} = 


P{X i =k,Xi+X 2 = m } 
P{X l + X 2 = m] 

P{X i =k,X 2 = m - k] 
P{X i + X 2 = m] 

P{X i = k}P{X 2 = m - k} 


P{Xi + X 2 = m} 


n t 


k n n\—k 


p q 


n 2 

m — k 


r .m—k n ri2—m+k 
V H 


n i + n 2 


pm qin+in—m 


where we have used that X \ + X 2 is a binomial random variable with parameters 
(n i + n 2 , p) (see Example 2.44). Thus, the conditional probability mass function of 
Xi, given that X\ + X 2 = m, is 


P{X l =k\X l +X 2 = m] = 


n 2 

m — k 


n i + n 2 
in 


(3.1) 


The distribution given by Equation (3.1), first seen in Example 2.35, is known as 
the hypergeometric distribution. It is the distribution of the number of blue balls 
that are chosen when a sample of m balls is randomly chosen from an urn that 
contains n\ blue and n 2 red balls. (To intuitively see why the conditional distribution 
is hypergeometric, consider n\ + n 2 independent trials that each result in a success 
with probability p; let X\ represent the number of successes in the first n \ trials and 
let X 2 represent the number of successes in the final n 2 trials. Because all trials have 

the same probability of being a success, each of the (" i ( ^” 2 ) subsets of rn trials is 

equally likely to be the success trials; thus, the number of the m success trials that 
are among the first n i trials is a hypergeometric random variable.) ■ 

Example 3.3 If X and Y are independent Poisson random variables with respective 
means /,i and X 2 , calculate the conditional expected value of X given that X + Y — n. 
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Solution: Let us first calculate the conditional probability mass function of X given 
that X + Y = n. We obtain 


P{X = k\X + Y = n] 


P{X = k, X + Y — n} 
P{X + Y = n} 

P{X = k,Y = n - k} 
P{X + Y = n} 

P{X = k}P{Y =n-k} 
P{X + Y = n] 


where the last equality follows from the assumed independence of X and Y. Recalling 
(see Example 2.37) that X + Y has a Poisson distribution with mean 7, i + 7.2. the 
preceding equation equals 


P{X = k\X + Y = n] 


e~ Xl x\ e- kl X’]~ k 
~k\ (n -k)\ 

n\ 


e -(M+A 2 )( Xl + x 2 ) n 


-1 


x\x'±~ k 


(n — k)\k\ (X l +X 2 ) n 


Xi 


■^1+^2/ + ^2 


'Xl 


n—k 


In other words, the conditional distribution of X given that X + Y = n is the binomial 
distribution with parameters n and Xi/(X\ + A. 2 ). Hence, 


E{X\X + Y = n} = n 


At 


^■1 + X 2 


Conditional expectations possess all of the properties of ordinary expectations. For 
example such identities such as 


E 


J2 x i\ Y = y 

-1=1 


Y J ElXi\Y = y] 

/=! 


E[h(X)\Y = y]= = y) 

X 


remain valid. 

Example 3.4 There are n components. On a rainy day, component i will function 
with probability p, ; on a nonrainy day, component i will function with probability q,, 
for i = \ ..... n. It will rain tomorrow with probability a. Calculate the conditional 
expected number of components that function tomorrow, given that it rains. 

Solution: Let 


Xi = 


1, if component i functions tomorrow 
0, otherwise 

















Conditional Probability and Conditional Expectation 


97 


Then, with Y defined to equal 1 if it rains tomorrow, and 0 otherwise, the desired 
conditional expectation is obtained as follows. 


E 


£*«■ i y 

_f=t 


Y,E[X i \Y= 1 ] 

i=\ 

n 

Ep‘ 

i=t 


3.3 The Continuous Case 


If X and Y have a joint probability density function fix, y), then the conditional 
probability density function of X, given that Y = y, is defined for all values of y such 
that fy (y) > 0, by 


fx\y(x\y) 


fix, y) 
friy) 


To motivate this definition, multiply the left side by dx and the right side by (dx dy) /dy 
to get 


fx\r(x\y) dx = 


f(x, y) dx dy 
fy (v) dy 

P{x ^ X ^ x + dx, y ^ Y ^ y + dy} 


P{y^Y^y + dy} 

= P{x ^ X ^ x + dx\y ^ Y ^ y + dy} 


In other words, for small values dx and dy, fx\y(x\y) dx is approximately the con¬ 
ditional probability that X is between x and x + dx given that Y is between y and 

y + dy- 

The conditional expectation of X, given that Y = y, is defined for all values of y 
such that fyiy) > 0, by 

/ OO 

xfx\y{x\y) dx 

-OO 


Example 3.5 Suppose the joint density of X and Y is given by 


fix,y) 


6xy(2 — x — y), 0<x<l,0<y<l 

0, otherwise 


Compute the conditional expectation of X given that Y = y, where 0 < y < 1. 
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Solution: We first compute the conditional density 


fx\r(x\y) 


fix, y) 

fviy) 

6.x v (2 — x — y) 

/J 6xy(2 — x — y) dx 
6xy(2 — x — y) 
y(4 - 3y) 

6x (2 — x — y) 

4-3 v 


Hence, 


E[X\Y = y] 


L 


1 6x 2 (2 — x — y) <7x 
4- 3y 


(2 - y)2 - § 
4- 3y 
5 — 4y 
8 — 6y 


Example 3.6 (The t-Distribution) If Z and Y are independent, with Z having a 
standard normal distribution and Y having a chi-squared distribution with n degrees of 
freedom, then the random variable T defined by 


T = 


Z _ Z 

, = y/n —— 

vr/« yi 7 


is said to be a t-random variable with n degrees of freedom. To compute its density 
function, we first derive the conditional density of T given that Y = y. Because Z and Y 
are independent, the conditional distribution of T given that Y = y is the distribution 
of y/n/yZ, which is normal with mean 0 and variance n/y. Hence, the conditional 
density function of T given that Y = y is 


/r|F(f|;y) 


1 p -t 2 y/2n 
y/2jin/y 


y —e~ t 2 yi 2n , 


V2 


jtn 


—oo < t < oo 


Using the preceding, along with the following formula for the chi-squared density 
derived in Exercise 87 of Chapter 2, 


-y/2 n/2—1 

fv (y ) = - 

J y 2 "/ 2 r(«/ 2 ) 


y > 0 


we obtain the density function of T : 


frit) = 



y) dy 



fT\¥(t\y)f Y (y ) dy 
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With 


K = 


this yields 


t 1 + n 1 


/¥R2( n+l )/ 2 r{n/2)' 


2 n 


= x 1 + - 


Mt) = ± f°° e- c y/ n ~»' 2 dy 

* Jo 

r -(n + 1)/2 poo 


K 


pOO 

I e~ x x (n ~ 1 ^ 2 dx (by letting x — cy) 

Jo 


~(n+ 1)/2 ( n + \ 

-r' 

K 

r(^) 


■JJvn T(£) V n 


{ 2 \ — (n+l)/2 
1 + - 


—oo < t < oo 


Example 3.7 

f(x,y) = 


The joint density of X and Y is given by 

f \ye~ xy , 0<x<oo,0<y<2 

10, otherwise 


What is £[e x / 2 |F =1]? 

Solution: The conditional density of X, given that Y = 1, is given by 


fx\y(x\l) 


fix, 1) 

Mi) 


ft 


-e~ x 

o c -X 

= e 


°°l e - 
0 2 e 


dx 


Hence, by Proposition 2.1, 



e x/2 fx\y(x\l) dx 
e x /2 e ~ x 


= 2 


Example 3.8 Let X i and Xo be independent exponential random variables with rates 
ji | and // 2 ■ Find the conditional density of X\ given that X\ + X 2 = t. 

Solution: To begin, let us first note that if f(x, y ) is the joint density of X, Y, then 
the joint density of X and X + Y is 


fx,x+Y(x, t) — f(x,t - x) 
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which is easily seen by noting that the Jacobian of the transformation 


gi(x,y)=x, gi{x,y)=x + y 


is equal to 1. 

Applying the preceding to our example yields 


fx l \x l +x 2 {x\t) 


fx u X l +X 2 (x, t ) 

fxi+x 2 (t) 

Mi e -^ x 

fx 1 +Z 2 (0 
Ce -(M-ti2) x Q< x <t 


0 ^ x ^ t 


where 

c = |Xl|X2e~' J ■ 2, 

fxi+x 2 it) 

Now, if mi = M 2 , then 


fx l \x l +x 2 {x\t) = C, 0<x<f 

yielding that C = l/t and that X i given X\ + X 2 = t is uniformly distributed on 
(0, t). On the other hand, if mi ^ M 2 , then we use 


1 = 


/ fXi\Xi 

Jo 


+X 2 (x\ t)dx 


C 


Mi - M2 


_ „-(/i 


to obtain 


C = 


Ml - M 2 


1 _ e -(Ml-M2 )t 

thus yielding the result: 

/x 1 |Zi + x 2 (x|0= ^ 1 1 _% 1 _, 2)f 
An interesting byproduct of our analysis is that 
MiM2e~ /X2f _ fvrte-^, 


fx i+x 2 (0 


c 


1 ftlM 2(£f^f = eZf}f) 

IH-H2 




if Ml = M2 = M 
if Ml / M2 


3.4 Computing Expectations by Conditioning 

Let us denote by E[X\Y] that function of the random variable Y whose value at Y = y 
is E[X\Y = y]. Note that E[X\Y\ is itself a random variable. An extremely important 
property of conditional expectation is that for all random variables X and Y 

£[X] = £[£[X|F]] (3.2) 
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If Y is a discrete random variable, then Equation (3.2) states that 

E[X] = Y J E V X \ Y = y]P{ Y = y) (3.2a) 

y 


while if Y is continuous with density fy(y), then Equation (3.2) says that 


£[*] = 



E[X\Y = y]f Y (y) dy 


(3.2b) 


We now give a proof of Equation (3.2) in the case where X and Y are both discrete 
random variables. 

Proof of Equation (3.2) When X and V Are Discrete. We must show that 


E[X] = Y,E[X\Y = y]P{Y = y} 

y 


(3.3) 


Now, the right side of the preceding can be written 

J2 E[X\Y = y]P{Y = y] = J2J2 xP ^ X = x l 7 = = >'! 

y y x 


P { X- X ,Y-y} p 

P{Y = y} 


= EE xP{X = x,Y = y} 

y X 

= £>{* = *, F = y} 

x y 

= xP{X = x} 

X 

= E[X] 


and the result is obtained. ■ 

One way to understand Equation (3.3) is to interpret it as follows. It states that to 
calculate £[X] we may take a weighted average of the conditional expected value of 
X given that Y = y, each of the terms E[X\Y = y] being weighted by the probability 
of the event on which it is conditioned. 

The following examples will indicate the usefulness of Equation (3.2). 

Example 3.9 Sam will read either one chapter of his probability book or one chapter 
of his history book. If the number of misprints in a chapter of his probability book is 
Poisson distributed with mean 2 and if the number of misprints in his history chapter is 
Poisson distributed with mean 5, then assuming Sam is equally likely to choose either 
book, what is the expected number of misprints that Sam will come across? 

Solution: Letting X denote the number of misprints and letting 
Y = 


1, if Sam chooses his history book 

2, if Sam chooses his probability book 
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then 


E[X] = E[X\Y = 1 ]P{Y = 1} + E[X\Y = 2 \P{Y = 2} 

= 50+2(1) 


Example 3.10 (The Expectation of the Sum of a Random Number of Random 
Variables) Suppose that the expected number of accidents per week at an industrial 
plant is four. Suppose also that the numbers of workers injured in each accident are 
independent random variables with a common mean of 2. Assume also that the number 
of workers injured in each accident is independent of the number of accidents that 
occur. What is the expected number of injuries during a week? 


Solution: Letting N denote the number of accidents and X, the number injured in 
the ;th accident, i = 1 , 2 ,..., then the total number of injuries can be expressed as 
£*!*,■. Now, 


But 


E* 


L 1 


= E 


N 


J2 x i\N 


J2 x i\N = 


L l 


= E 


= E 


Y d X i \N = . 


E* 


by the independence of X ,• and N 


= nE[X] 


which yields 


N 


J2 X ‘\ N 


L;=i 


= NE[X] 


and thus 


E 


~ N 


Li=l 


£[1V£(X]] = £[)V]E[X] 


Therefore, in our example, the expected number of injuries during a week equals 

4x2 = 8. " ■ 

The random variable Xj, equal to the sum of a random number N of inde¬ 

pendent and identically distributed random variables that are also independent of N, 
is called a compound random variable. As just shown in Example 3.10, the expected 
value of a compound random variable is ,E[X]£[,/V]. Its variance will be derived in 

Example 3.19. 
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Example 3.11 (The Mean of a Geometric Distribution) A coin, having probability 
p of coming up heads, is to be successively flipped until the first head appears. What 
is the expected number of flips required? 

Solution: Let N be the number of flips required, and let 


Y = 


1 , 

0 , 


if the first flip results in a head 
if the first flip results in a tail 


Now, 


£[A] = E[N\Y = 1 ]P{Y = 1} + E[N\Y = 0]B{L = 0} 

= pE[N\Y = 1] + (1 — p)E[N\Y = 0] (3.4) 


However, 


E[N\Y = 1] = 1, E[N\Y = 0] = 1 + E[N] (3.5) 

To see why Equation (3.5) is true, consider E[N\Y = 1]. Since Y = 1, we know that 
the first flip resulted in heads and so, clearly, the expected number of flips required 
is 1. On the other hand if Y = 0, then the first flip resulted in tails. However, 
since the successive flips are assumed independent, it follows that, after the first 
tail, the expected additional number of flips until the first head is just /L [ A/ |. Hence 
£[A/jL = 0] = 1 + £[A]. Substituting Equation (3.5) into Equation (3.4) yields 

E[N] = p + (1 - p)(l + £[A]) 


or 


£[A] = i ip u 

Because the random variable A is a geometric random variable with probability 
mass function p(n) = p( 1 — /?)” , its expectation could easily have been computed 

from £[ N] = np(n) without recourse to conditional expectation. However, if you 

attempt to obtain the solution to our next example without using conditional expectation, 
you will quickly learn what a useful technique “conditioning” can be. 

Example 3.12 A miner is trapped in a mine containing three doors. The first door 
leads to a tunnel that takes him to safety after two hours of travel. The second door 
leads to a tunnel that returns him to the mine after three hours of travel. The third 
door leads to a tunnel that returns him to his mine after five hours. Assuming that the 
miner is at all times equally likely to choose any one of the doors, what is the expected 
length of time until the miner reaches safety? 

Solution: Let X denote the time until the miner reaches safety, and let Y denote 
the door he initially chooses. Now, 

E[X] = E[X\Y = 1]P{L = 1} + E[X\Y = 2 \P{Y = 2} 

+E[X\Y = 3]P{L = 3} 

= 3(£[X|L = 1] + E[X\Y = 2] + E[X\Y = 3]) 
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However, 


E[X\Y = 1] = 2, 

E[X\Y = 2] = 3 + E[X], 

£[X|T = 3] = 5 + £[X], (3.6) 


To understand why this is correct consider, for instance, E[X\Y = 2], and reason 
as follows. If the miner chooses the second door, then he spends three hours in the 
tunnel and then returns to the mine. But once he returns to the mine the problem is 
as before, and hence his expected additional time until safety is just E[X], Hence 
E[X\Y = 2] = 3 + E[X], The argument behind the other equalities in Equation 
(3.6) is similar. Hence, 

E[X] = i(2 + 3 + £[X] + 5 + £[X]) or E[X] = 10 ■ 

Example 3.13 (Multinomial Covariances) Consider n independent trials, each of 
which results in one of the outcomes 1, ..., r, with respective probabilities p \,..., p r , 
Xw=t Pi = 1- ^ we let denote the number of trials that result in outcome ;, then 
(N\, ..., N r ) is said to have a multinomial distribution. For i ^ j, let us compute 


Cov(JVi, Nj) = E[NiNj] - E[Nj]E[Nj] 

Because each trial independently results in outcome i with probability p ,, it follows 
that Ni is binomial with parameters ( n , p,), giving that E[Nj]E[Nj] = n 2 pipj. To 
compute E[NjNj], condition on Ni to obtain 


E[N t Nj] = J2 EWNj\Ni = k]P(Ni = k) 

k =0 
n 

= J 2 kE [Nj\Ni = k]P(Nj = k) 
k =0 

Now, given that k of the n trials result in outcome i, each of the other n — k trials 
independently results in outcome j with probability 

POjnoti) = — 

I - Pi 

thus showing that the conditional distribution of Nj, given that N ; = k, is binomial 
with parameters (n — k, jzjp)- Using this yields 


E[NiNj] = J2 k (n~ k) Pj P(Ni = k) 

b—f\ 1 Pi 

/ n n \ 

n kP(Ni =k)~Y k 2 P(Ni = k) 


k= 0 

Pj 

1 - Pi 
Pj 

1 - Pi 


k=0 


k=0 


C nE[Ni ] - E[Nf]) 
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Because Nj is binomial with parameters (n. p,) 


E[N 2 ] = Var (Nj) + ( E[Nj ]) 2 = np t { 1 - p t ) + («/>,• ) 2 


Hence, 

E[NiNj] = Pj [n 2 pi - npi ( 1 - p t ) - n 2 pj] 
1 Pi 

n PiPj r n m 

= - -[« - npi - (1 - pi)] 

1 - Pi 

= n(n - 1 )piPj 


which yields the result 

Cov(Ni, Nj) = n(n - \ )pipj - trpiPj = -npiPj 


Example 3.14 (The Matching Rounds Problem) Suppose in Example 2.31 that 
those choosing their own hats depart, while the others (those without a match) put their 
selected hats in the center of the room, mix them up, and then reselect. Also, suppose 
that this process continues until each individual has his own hat. 

(a) Find E [ R n ] where R„ is the number of rounds that are necessary when n individuals 
are initially present. 

(b) Find E[S„] where S n is the total number of selections made by the n individuals, 
n Js 2. 

(c) Find the expected number of false selections made by one of the n people, n ^ 2. 

Solution: (a) It follows from the results of Example 2.31 that no matter how many 
people remain there will, on average, be one match per round. Hence, one might 
suggest that E[R„] = n. This turns out to be true, and an induction proof will 
now be given. Because it is obvious that £[I?i] = 1, assume that E\R^\ = k for 
k = 1,..., n — 1. To compute E\R n ], start by conditioning on X„, the number of 
matches that occur in the first round. This gives 

n 

E[R n ] = J2 E l R n\Xn = i]P{X„ = l) 

1=0 

Now, given a total of i matches in the initial round, the number of rounds needed 
will equal 1 plus the number of rounds that are required when n — i persons are to 
be matched with their hats. Therefore, 

n 

E[Rn] = (1 + E[Rn-i])P[X„ = i] 

i =0 

n 

= 1 + E[R n ]P{X n = 0} + E[R n -i]P{X n = i} 

i=i 
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= 1 + E[R n ]P{X n = 0} + (« - 0 P{X n = i} 

i'=i 

by the induction hypothesis 

= 1 + E[R n ]P{X n = 0} + n{ 1 - P{X n = 0}) - E[X n ] 

= E[R n ]P{X n = 0} + n{ 1 - P{X n = 0}) 

where the final equality used the result, established in Example 2.31 , that E[X n \ = 1. 
Since the preceding equation implies that E[R n ] = n, the result is proven. 

(b) For n ^ 2, conditioning on X n , the number of matches in round 1, gives 

n 

£[S„] = E ^n\X n = i]P{X n = l) 
i= 0 
n 

= J2 (,1 + E[S n -i])P{X n = l) 
i= 0 

n 

= n + J2 E \-Sn-i]P{X n =i} 

1=0 

where £[.So] — 0. To solve the preceding equation, rewrite it as 
£[S„] =n + E[S n - X J 

Now, if there were exactly one match in each round, then it would take a total of 
1+2 +•••+« = n(n + l)/2 selections. Thus, let us try a solution of the form 
£[5,,] = an + bn 2 . For the preceding equation to be satisfied by a solution of this 
type, for n + 2, we need 

an + bn 2 = n + E[a(n — X„) + b{n — 2f„) 2 ] 

or, equivalently, 

an + bn 2 = n + a{n — E[X n ]) + b{n 2 — 2nE[X n \ + ,E[X 2 ]) 

Now, using the results of Example 2.31 and Exercise 72 of Chapter 2 that E[X n \ = 
Var( X n ) = 1, the preceding will be satisfied if 

an + bn 2 = n + an — a + bn 2 — 2 nb + 2b 

and this will be valid provided that b = 1/2, a = [. That is, 

E[S n ] = n + n 2 /2 

satisfies the recursive equation for Z?[S n ]. 

The formal proof that £[5,,] = n + n 2 / 2, n + 2, is obtained by induction on n. It 
is true when n = 2 (since, in this case, the number of selections is twice the number 
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of rounds and the number of rounds is a geometric random variable with parameter 
p = 1/2). Now, the recursion gives 


£[S„] = n + E[S n ]P{X n = 0} + E[S n -i]P{X n = i\ 

i=i 

Hence, upon assuming that £'[5'o] = £[Si] = 0, = k + k 2 /2, for k = 2,..., 

n — 1 and using that P{X n = n — 1} = 0, we see that 

n 

E[S n ] = n + E[S n ]P{X n = 0} + £[n - i + (n - if/2]P{X n = i) 

i =1 

= n + E[S n ]P{X n = 0} + (n + n 2 /2)(l - P{X n = 0}) 

-(n + l)E[X n ] + E[X 2 ]/2 

Substituting the identities E[X n ~\ = 1, E\X 2 \ — 2 in the preceding shows that 
£[5,,] = n + n 2 /2 
and the induction proof is complete. 

(c) If we let Cj denote the number of hats chosen by person j, j = \,... ,n then 

n 

Y. Cj = Sn 

j =1 

Taking expectations, and using the fact that each Cj has the same mean, yields the 
result 

E[Cj] = E[S„]/n= l+n/2 

Hence, the expected number of false selections by person j is 

E[Cj - 1] = n/2. m 

Example 3.15 Independent trials, each of which is a success with probability p, 
are performed until there are k consecutive successes. What is the mean number of 
necessary trials? 

Solution: Let Nk denote the number of necessary trials to obtain k consecutive 
successes, and let Mk = Lf/V/,]. We will determine M* by deriving and then solving 
a recursive equation that it satisfies. To begin, write 

N k = Nk-i + Ak-i t k 

where Nk-i is the number of trials needed for k — 1 consecutive successes, and 
Ak-i'k is the number of additional trials needed to go from having k — 1 successes 
in a row to having k in a row. Taking expectations gives that, 


Mk = Mk-\ + E[Ak-\,k] 
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To determine E[A k _i j, condition on the next trial after there have been k — 1 
successes in a row. If it is a success then that gives A: in a row and no additional trials 
after that are needed; if it is a failure then at that point we are starting all over and 
so the expected addtional number from then on would be E[N k ], Thus, 

E[A k _ u ] = 1 • p + (1 + M k ){\ -p)= 1 + (1 - p)M k 
giving that 

M k = M k — \ + 1 + (1 - p)M k 


or 


M k = - + 

P P 

Since N\, the time of the first success, is geometric with parameter p, we see that 
1 

M i = - 
P 

and, recursively 

1 

Mi = -h 

P 

1 

Mi — —h 
P 

and, in general, 

M k = - + 

P 

Example 3.16 (Analyzing the Quick-Sort Algorithm) Suppose we are given a set 
of n distinct values— x\ , ..., x„—and we desire to put these values in increasing order 
or, as it is commonly called, to sort them. An efficient procedure for accomplishing 
this is the quick-sort algorithm, which is defined recursively as follows: When n = 2 
the algorithm compares the two values and puts them in the appropriate order. When 
n > 2 it starts by choosing at random one of the n values—say, x ,—and then compares 
each of the other n — 1 values with x,, noting which are smaller and which are larger 
than Xi. Letting .S', denote the set of elements smaller than x;, and S, the set of elements 
greater than x;, the algorithm now sorts the set .S, and the set .S,. The final ordering, 
therefore, consists of the ordered set of the elements in Sj , then x,-, and then the ordered 
set of the elements in .S’,. For instance, suppose that the set of elements is 10, 5, 8, 2, 1, 
4, 7. We start by choosing one of these values at random (that is, each of the 7 values 
has probability of | of being chosen). Suppose, for instance, that the value 4 is chosen. 
We then compare 4 with each of the other six values to obtain 



{2, 1},4, {10, 5, 8,7} 
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We now sort the set {2, 1} to obtain 
1,2,4, {10, 5,8,7} 

Next we choose a value at random from {10, 5, 8, 7}—say 7 is chosen—and compare 
each of the other three values with 7 to obtain 


1,2,4, 5,7, {10, 8} 


Finally, we sort {10, 8} to end up with 


1,2, 4, 5, 7, 8, 10 

One measure of the effectiveness of this algorithm is the expected number of compar¬ 
isons that it makes. Let us denote by M n the expected number of comparisons needed 
by the quick-sort algorithm to sort a set of n distinct values. To obtain a recursion for 
M n we condition on the rank of the initial value selected to obtain 

A l 

M n = > £ [number of comparisons | value selected is /'th smallest] — 

Z ' 77 


Now, if the initial value selected is the jth smallest, then the set of values smaller than 
it is of size j — 1, and the set of values greater than it is of size n — j. Hence, as n — 1 
comparisons with the initial value chosen must be made, we see that 




77 — 1 


or, equivalently, 


n -1 


nM n = n(n — 1) + 2 Mk 


k= t 


To solve the preceding, note that upon replacing n by n + 1 we obtain 


n 


(n + l)M„ + i = (n + l)n + 2^2 Mk 


k= 1 


Hence, upon subtraction. 


(n + l)M„-|_i — nM„ — 2 n + 2 M n 


or 


(n + l)M„ + i = (n + 2 )M n + 2 n 
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2 n 


M„ 


Therefore, 

M n -\-1 ^ 

n + 2 (n + l)(n + 2) n + 1 

Iterating this gives 

M n+ i 2n 


| 2(n-l) | M„_i 


(« + l)(n + 2) «(n + 1) 

W— 1 




n — k 


k =0 


(n + 1 — k)(n + 2 — k ) 


since Mj = 0 


Hence, 

M„ + i = 2(« + 2) E 


«—1 


H — £ 


k=0 


(« + 1 — k)(n + 2 — k) 


— 2(n + 2) E 7T 


^ (i+ !)(/ +2)’ 


n ^ I 


Using the identity i/(i + l)(i + 2) = 2/(i + 2)—1/(; + 1), we can approximate 
M„+i for large n as follows: 


M n+ 1 = 2(« + 2) 


' " o n | 

E —-E — 

^ i + 2 ^ i + 1 

L 2=1 1=1 

" pn+2 2 r 

2(h + 2) / — dx — I 

_J 3 ^ J 2 


n +1 j 

— r/x 
x 


= 2(« + 2)[2 login + 2) - log(n + 1) + log 2 - 2 log 3] 
= 2(n + 2) 

~ 2(n + 2) login + 2) 


login + 2) + log 11 + 2 + log 2 - 2 log 3 
n + 1 


Although we usually employ the conditional expectation identity to more easily 
enable us to compute an unconditional expectation, in our next example we show how 
it can sometimes be used to obtain the conditional expectation. 

Example 3.17 In the match problem of Example 2.31 involving n, n > 1, individuals, 
find the conditional expected number of matches given that the first person did not have 
a match. 

Solution: Let X denote the number of matches, and let X i equal 1 if the first person 
has a match and 0 otherwise. Then, 

£[X] = E[X |Xi = 0]P{Xi = 0} + E[X\Xi = l]P{Xi = 1} 

= E[X |Xi = 0]— + E[X |Xj = 1]- 


n 
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But, from Example 2.31 
E[X] = 1 

Moreover, given that the first person has a match, the expected number of matches 
is equal to 1 plus the expected number of matches when n — 1 people select among 
their own n — 1 hats, showing that 

E[X\Xi = 1] = 2 
Therefore, we obtain the result 
n — 2 

E[X|Xi=0] =-- ■ 

n — 1 

3.4.1 Computing Variances by Conditioning 

Conditional expectations can also be used to compute the variance of a random variable. 
Specifically, we can use 

Var(X) = E[X 2 ] - (E[X]) 2 

and then use conditioning to obtain both E[X] and E\X 2 ]. We illustrate this technique 
by determining the variance of a geometric random variable. 

Example 3.18 (Variance of the Geometric Random Variable) Independent trials, 
each resulting in a success with probability p, are performed in sequence. Let N be the 
trial number of the first success. Find Var(V). 

Solution: Let Y = 1 if the first trial results in a success, and Y = 0 otherwise. 
Var(iV) = E[N 2 ] - (£[V]) 2 

To calculate £[V 2 ] and E[V] we condition on Y. For instance, 

E[N 2 ] = E[E[N 2 \Y]\ 

However, 

E[N 2 \Y = 1] = 1, 

E[N 2 \Y = 0] = £[(1 + N) 2 ] 

These two equations are true since if the first trial results in a success, then clearly 
N = 1 and so N 2 = 1. On the other hand, if the first trial results in a failure, then 
the total number of trials necessary for the first success will equal one (the first trial 
that results in failure) plus the necessary number of additional trials. Since this latter 
quantity has the same distribution as N, we get that E[N 2 \Y = 0] = £[(1 + V) 2 ]. 
Hence, we see that 

£[V 2 ] = E[N 2 \Y = 1 }P{Y = 1} + E[N 2 \Y = OlL’fT = 0} 

= P + £[(1 + V) 2 ](l — p) 

= 1 + (1 — p)E[2N + N 2 ] 
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Since, as was shown in Example 3.10, £[iV] = 1 / p, this yields 

9 2(1 — p) , 

E[N~] = 1 + --— + (1 - p)E[N 2 ] 

P 

or 


P* 

Therefore, 

Var(A) = E[N 2 ] - (£[AT|) 2 



Another way to use conditioning to obtain the variance of a random variable is to 
apply the conditional variance formula. The conditional variance of X given that Y = y 
is defined by 

Var(A|T = y) = E[(X - E[X\Y = y]) 2 |T = y] 

That is, the conditional variance is defined in exactly the same manner as the ordinary 
variance with the exception that all probabilities are determined conditional on the 
event that Y = y. Expanding the right side of the preceding and taking expectation 
term by term yields 

Var(A|T = >') = E[X 2 \Y = y] - ( E[X\Y = y]) 2 

Letting Var(X| Y) denote that function of Y whose value when Y — y is Var( A| Y = y), 
we have the following result. 

Proposition 3.1 (The Conditional Variance Formula) 

Var(X) = F[Var(A|F)] + Var(F[A|y]) (3.7) 

Proof. 

£[Var(A|T)] = E[E[X 2 \Y] - {E[X\Y]) 2 ] 

= E[E[X 2 \Y]\ - E[(E[X\Y]) 2 ] 

= E[X 2 ] - E[(E[X\Y]) 2 ] 


and 

Var(£[X|F]) = E[(E[X\Y]) 2 ] - (F[£[Z|T]]) 2 
= £[(£[Z|F]) 2 ] - (E[X]f 
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Therefore, 

£[Var(X|T)] + Var(£[X|T]) = E[X 2 ] - ( E[X ]) 2 

which completes the proof. ■ 

Example 3.19 (The Variance of a Compound Random Variable) Let X\, X 2 , ■ ■ ■ 

be independent and identically distributed random variables with distribution F having 
mean p and variance cr 2 , and assume that they are independent of the nonnegative 
integer valued random variable N. As noted in Example 3.10, where its expected value 
was determined, the random variable S — X, is called a compound random 

variable. Find its variance. 

Solution: Whereas we could obtain £[S 2 ] by conditioning on N, let us instead use 
the conditional variance formula. Now, 

/ N 

Var(S|A = n) = Var I X, 

= Var (±X t 

= Var (±Xi 
\i =1 


| N = n 

| N = n 


By the same reasoning, 

£[5|(V = n] = n/j. 

Therefore, 

Var(5|A) = No 2 , E[S\N] = N/i 

and the conditional variance formula gives 

Var(S) = E[No 2 ] + Var (Nfi) — o 2 E[N] + /n 2 Var (N) 

If A is a Poisson random variable, then S = X, is called a compound Poisson 
random variable. Because the variance of a Poisson random variable is equal to its 
mean, it follows that for a compound Poisson random variable having £[A] = X 

Var(S) = Xo 1 + Xp 2 = XE[X 2 ] 

where X has the distribution F. ■ 

Example 3.20 (The Variance in the Matching Rounds Problem) Consider the 
matching rounds problem of Example 3.14, and let V n — Var( R„) denote the variance 
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of the number of rounds needed when there are initially n people. Using the conditional 
variance formula, we will show that 

V n — n, n ^ 2 

The proof of the preceding is by induction on n. To begin, note that when n = 2 the 
number of rounds needed is geometric with parameter p = 1/2 and so 



So assume the induction hypothesis that 
Vj = j, 2 < j < n 

and now consider the case when there are n individuals. If X is the number of matches 
in the first round then, conditional on X , the number of rounds R n is distributed as 1 plus 
the number of rounds needed when there are initially n — X individuals. Consequently, 

E[R n \X]= 1 + E[R n - X ] 

= 1 + n — X by Example 3.14 

Also, with Vo = 0, 

Var(fl„|X) = Var(tf„_ x ) = V n - X 

Hence, by the conditional variance formula 

Vn = E[Vai(R n \X )] + Var(£[/?„|X]) 

= E[V n - X ] + Var(A) 

n 

= v n-jP(X = J) + Var(A) 
j =o 

n 

= V n P(X = 0) + J2 Vn-jP(X = j ) + Var(X) 

7=1 

Because IHX — n— 1) = 0, it follows from the preceding and the induction hypothesis 
that 

n 

V n = V n P(X = 0) + (« - j)P(X = j ) + Var(X) 

7=1 

= V n P(X = 0) + n( 1 - P(X = 0)) - £[X] + Var(X) 

As it is easily shown (see Example 2.31 and Exercise 72 of Chapter 2) that E[X] — 
Var(X) = 1, the preceding gives 

v n = V n P(X = 0) + n( 1 - P(X = 0)) 


thus proving the result. 




Conditional Probability and Conditional Expectation 


115 


3.5 Computing Probabilities by Conditioning 


Not only can we obtain expectations by first conditioning on an appropriate random 
variable, but we may also use this approach to compute probabilities. To see this, let E 
denote an arbitrary event and define the indicator random variable X by 


X = 


1 , 

0 , 


if E occurs 

if E does not occur 


It follows from the definition of X that 


E[X] = P{E), 

E[X\Y = y] = P(E\Y = y), for any random variable Y 
Therefore, from Equations (3.2a) and (3.2b) we obtain 

P(E) = P(E\Y = y)P(Y = y), if Y is discrete 



P(E\Y = y)f Y (y) dy , 


if Y is continuous 


Example 3.21 Suppose that X and Y are independent continuous random variables 
having densities fx and fy , respectively. Compute P{X < Y}. 

Solution: Conditioning on the value of Y yields 


P{X < Y] = 



P{X < Y\Y = y}f Y (y) dy 
P{X < y\Y = y}f Y (y) dy 
P{X < y}fy(y)dy 
Fx(y)fy(y)dy 


where 


Fx(y) = [ fx(x)dx 

J —OO 


Example 3.22 An insurance company supposes that the number of accidents that 
each of its policyholders will have in a year is Poisson distributed, with the mean of 
the Poisson depending on the policyholder. If the Poisson mean of a randomly chosen 
policyholder has a gamma distribution with density function 

g(X) = Xe~ x , X^O 

what is the probability that a randomly chosen policyholder has exactly n accidents 
next year? 
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Solution: Let X denote the number of accidents that a randomly chosen policy¬ 
holder has next year. Letting Y be the Poisson mean number of accidents for this 
policyholder, then conditioning on Y yields 


P{X = n\ 



P{X = n\Y = L}g(A) dX 

e- x ^-Xe~ x dX 
nl 


1 

nl 



dX 


However, because 


h(X) = 


2e~ 2x (2X) n+l 
(n + 1)! 


X>0 


is the density function of a gamma (n + 2, 2) random variable, its integral is 1. 
Therefore, 


i-/ 


2e~ lk {2X) 

(n + 1 )! 


n +1 


dX = 


2 n+1 

(«+l) 


POO 

~ / A." 

! Jo 


+ t e -2A 


dX 


showing that 


P{X = n) 


n + 1 
2"+ 2 


Example 3.23 Suppose that the number of people who visit a yoga studio each day 
is a Poisson random variable with mean X. Suppose further that each person who visits 
is, independently, female with probability p or male with probability 1 — p. Find the 
joint probability that exactly n women and m men visit the academy today. 

Solution: Let N\ denote the number of women and Ni the number of men who 
visit the academy today. Also, let N — N\ + N 2 be the total number of people who 
visit. Conditioning on N gives 


OO 

P{Ni = n, N 2 = m } = P ( N i =n,N 2 = m\N = i}P{N = i} 

;'= 0 

Because P{N 1 = n, N 2 = m\N = i] = 0 when i ^ n + m, the preceding equation 
yields 


P{N[ = n, N 2 = m } = P{N\ — n, N 2 = m\N = n + m}e x - - 

(n + m)\ 

Given that n +m people visit it follows, because each of these n+m is independently a 
woman with probability p, that the conditional probability that n of them are women 
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(and m are men) is just the binomial probability of n successes in n + m trials. 
Therefore, 


P{N\ = n, N 2 — m] = 


n + m ) p»a-p) m e- x ktt+m 
n ) 1 ( n+m)\ 

(,,+ "V (l - P r,->-r e -W-r> 

n\m\ (n + m)! 


= e~ kp 


(W -m-„) Wl -P)) n 


Because the preceding joint probability mass function factors into two products, one 
of which depends only on n and the other only on m, it follows that Ni and AP are 
independent. Moreover, because 


OO 

P{Ni =n}= p i N i =n,N 2 = m} 

m= 0 

= y .-xq- rt (Ml ~P)T = 

n\ ' m! n\ 

m= 0 


and, similarly. 


P{N 2 = m} = e ~ Hl ~ p) 


(A.(l - P)) m 
m\ 


we can conclude that N\ and N 2 are independent Poisson random variables with 
respective means Xp and 1(1 — p). Therefore, this example establishes the important 
result that when each of a Poisson number of events is independently classified either 
as being type 1 with probability p or type 2 with probability 1 — p, then the numbers 
of type 1 and type 2 events are independent Poisson random variables. ■ 

The result of Example 3.23 generalizes to the case where each of a Poisson dis¬ 
tributed number of events, N, with mean X is independently classified as being one 
of k types, with the probability that it is type i being pi , i = 1,..., k, Y^=i Pi = 1- 
If Nj is the number that are classified as type i, then N\, ... , Nk are independent 
Poisson random variables with respective means Ip 1 ,,Xpk. This follows, since for 

E k 

/=1 «i 


P(N 1 = m,...,Nk = n k ) = P{N\ = m, ..., N k = n k \N = n)P(N = n) 


n\ 

n 1 ! • • • n k \ 

k 


»l 

Pi 


■ ■ p n k k e- x X n /n\ 


= We-^iXpiYi/m'. 

i=i 

where the second equality used that, given a total of n events, the numbers of each type 
has a multinomial distribution with parameters (n, p 1 , ..., pk)- 
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Example 3.24 (The Distribution of the Sum of Independent Bernoulli Random 
Variables) Let Xi, ..., X n be independent Bernoulli random variables, with X, hav¬ 
ing parameter pi, i = 1That is, P{Xj = 1} = pi , P{Xj = 0} = qi = 1 — pi. 
Suppose we want to compute the probability mass function of their sum, Xi + • • • + X n . 
To do so, we will recursively obtain the probability mass function of X\ + ■ ■ ■ + X k , 
first for k — 1, then k — 2, and on up to k = n. To begin, let 


PkU) = P{Xi + --- + X k = j] 


and note that 


k 


k 


Pk(k) = Y\ P i, P,(0) = I“['?' 


i= 1 


i=l 


For 0 < j < k, conditioning on X k yields the recursion 

Pk(j) = P{X i + ... + X k = j\X k = 1 }p k + P{X { + ... + x k = j\X k = 0 }q k 
= P{Xi + ■ ■ ■ + X k - { =j-l\X k = 1 }p k 
+ P{X l +--- + X k - l =j\x k =0}q k 
= P{X 1 + • • • + X k -1 = j - 1 }p k + P{X x + • • • + X k —\ = j}q k 
= PkPk-iU - 1 ) + qkPk-i(j) 

Starting with Pi(l) = pi, Pi(0) = q\ , the preceding equations can be recursively 
solved to obtain the functions P 2 O), P 3 (j), up to P n (j ). ■ 

Example 3.25 (The Best Prize Problem) Suppose that we are to be presented with 
n distinct prizes in sequence. After being presented with a prize we must immediately 
decide whether to accept it or reject it and consider the next prize. The only information 
we are given when deciding whether to accept a prize is the relative rank of that prize 
compared to ones already seen. That is, for instance, when the fifth prize is presented 
we learn how it compares with the first four prizes already seen. Suppose that once 
a prize is rejected it is lost, and that our objective is to maximize the probability of 
obtaining the best prize. Assuming that all n\ orderings of the prizes are equally likely, 
how well can we do? 

Solution: Rather surprisingly, we can do quite well. To see this, fix a value k, 0 ^ 
k < n, and consider the strategy that rejects the first k prizes and then accepts the first 
one that is better than all of those first k. Let P k (best) denote the probability that the 
best prize is selected when this strategy is employed. To compute this probability, 
condition on X, the position of the best prize. This gives 


n 


P,t(best) = /4(best|A = i)P(X = i) 
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Now, if the overall best prize is among the first k, then no prize is ever selected under 
the strategy considered. On the other hand, if the best prize is in position i, where 
i > k. then the best prize will be selected if the best of the first k prizes is also the best 
of the first i — 1 prizes (for then none of the prizes in positions k + 1, k + 2 ,..., i — 1 
would be selected). Hence, we see that 


Pi:(best|X = i) = 0, if i ^ k 

PHbestlX = i) = P{best of first f — 1 is among the first k} 
= k/(i — 1), if i > k 


From the preceding, we obtain 

k A 1 

P k (best) = - } - -- 

n z —' i — 1 
i=k+l 

k r~ l i 

/ — dx 

n Jk x 


k 

= - log 
n 


n — 1 




Now, if we consider the function 

x /n\ 

g(x) = - log ( - 
n \x / 

then 



and so 


g'(x) = 0 =>■ log (n/x) = 1 x = nle 

Thus, since P k ( best) ~ g(k), we see that the best strategy of the type considered is 
to let the first n /e prizes go by and then accept the first one to appear that is better 
than all of those. In addition, since g(n/e) = 1/e, the probability that this strategy 
selects the best prize is approximately 1/e & 0.36788. 

Remark Most students are quite surprised by the size of the probability of obtaining 
the best prize, thinking that this probability would be close to 0 when n is large. 
However, even without going through the calculations, a little thought reveals that the 
probability of obtaining the best prize can be made to be reasonably large. Consider 
the strategy of letting half of the prizes go by, and then selecting the first one to appear 
that is better than all of those. The probability that a prize is actually selected is the 
probability that the overall best is among the second half and this is 1 /2. In addition. 
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given that a prize is selected, at the time of selection that prize would have been the best 
of more than n /2 prizes to have appeared, and would thus have probability of at least 
1 /2 of being the overall best. Hence, the strategy of letting the first half of all prizes 
go by and then accepting the first one that is better than all of those prizes results in a 
probability greater than 1 /4 of obtaining the best prize. ■ 

Example 3.26 At a party n men take off their hats. The hats are then mixed up and 
each man randomly selects one. We say that a match occurs if a man selects his own hat. 
What is the probability of no matches? What is the probability of exactly k matches? 

Solution: Let E denote the event that no matches occur, and to make explicit the 
dependence on n, write P n = P(E). We start by conditioning on whether or not the 
first man selects his own hat—call these events M and M c . Then 


P n = P{E) = P(E\M)P(M) + P(E\M C )P(M C ) 


Clearly, P(E\M) — 0, and so 

P n = P(£|M C )— (3.8) 

n 

Now, P(E\M C ) is the probability of no matches when n — 1 men select from a set of 
n — 1 hats that does not contain the hat of one of these men. This can happen in either 
of two mutually exclusive ways. Either there are no matches and the extra man does 
not select the extra hat (this being the hat of the man that chose first), or there are no 
matches and the extra man does select the extra hat. The probability of the first of 
these events is just P n -\, which is seen by regarding the extra hat as “belonging” to 
the extra man. Because the second event has probability [1 /(« — I )] P n -2, we have 

P(E\M C ) = P„-i H- ^—P n -2 

n — 1 


and thus, from Equation (3.8), 

n — 1 1 

Pn = - P? 1 - 1 H- Pn-2 

n n 

or, equivalently. 


Pn - Pn -1 =- (Pn-] ~ P n -2 ) 


(3.9) 


However, because P n is the probability of no matches when n men select among 
their own hats, we have 

P] = 0, P 2 = I 


and so, from Equation (3.9), 

„ „ (Pi -Pi) 1 „ 1 1 

3 3! 2! 3! 

(P 3 -Pi) 1 111 

P 4 - P 3 = ———-— = - or P 4 =-h — 

4 4! 2! 3! 4! 
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and, in general, we see that 

1 1 1 

P n = -1- 

2! 3! 4! 


(-D" 

n\ 


To obtain the probability of exactly k matches, we consider any fixed group of k 
men. The probability that they, and only they, select their own hats is 

I_l_ (n~kV 

nn — 1 n — (k — 1) " k n\ " k 

where P n -k is the conditional probability that the other n — k men, selecting among 
their own hats, have no matches. Because there are (?) choices of a set of k men, the 
desired probability of exactly k matches is 

11, , 

P n -k = 2! ~ 3! + ’" + { n -k)\ 
k\ k\ 

which, for n large, is approximately equal to e~ l /k\. 

Remark The recursive equation, Equation (3.9), could also have been obtained by 
using the concept of a cycle, where we say that the sequence of distinct individuals 
it, h, ■ ■ ■, ik constitutes a cycle if i\ chooses zVs hat, 4 chooses (Vs hat, ..., 4 -1 
chooses 4’s hat, and 4 chooses i\ ’s hat. Note that every individual is part of a cycle, 
and that a cycle of size k — 1 occurs when someone chooses his or her own hat. With 
E being, as before, the event that no matches occur, it follows upon conditioning on 
the size of the cycle containing a specified person, say person 1, that 

n 

P„ = P(E) = J2 p (E\C = k)P(C = k) (3.10) 

k= 1 


where C is the size of the cycle that contains person 1. Now, call person 1 the first 
person, and note that C = k if the first person does not choose l’s hat; the person 
whose hat was chosen by the first person—call this person the second person—does 
not choose 1 ’s hat; the person whose hat was chosen by the second person—call this 
person the third person—does not choose 1 ’s hat;..., the person whose hat was chosen 
by the (k — l)st person does choose l’s hat. Consequently, 


P(C =k) = 


n — 1 n — 2 
n n — 1 


n — k + 1 1 _ 1 

n-(+2n-(+l n 


(3.11) 


That is, the size of the cycle that contains a specified person is equally likely to be any 
of the values 1, 2,..., n. Moreover, since C = I means that 1 chooses his or her own 
hat, it follows that 


P(E\C = 1) = 0 
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On the other hand, if C = k, then the set of hats chosen by the k individuals in this 
cycle is exactly the set of hats of these individuals. Hence, conditional on C = k, 
the problem reduces to determining the probability of no matches when n — k people 
randomly choose among their own n — k hats. Therefore, for k > 1 

P(E\C = k) = P„- k (3.12) 


Substituting (3. 1 1)—(3 .1 3) back into Equation (3.10) gives 

1 ” 

Pn = - Pn-k 

11 ‘ ^ 


k=2 


(3.13) 


which is easily shown to be equivalent to Equation (3.9). ■ 

Example 3.27 (The Ballot Problem) In an election, candidate A receives n votes, 
and candidate B receives m votes where n > m. Assuming that all orderings are 
equally likely, show that the probability that A is always ahead in the count of votes is 
(n — m)/(n + m). 

Solution: Let P nm denote the desired probability. By conditioning on which can¬ 
didate receives the last vote counted we have 

n 

P nm = P{A always ahead| A receives last vote}- 

n + m 
m 

+P{A always ahead|B receives last vote}- 

n + m 

Now, given that A receives the last vote, we can see that the probability that A is 
always ahead is the same as if A had received a total of n — 1 and B a total of m 
votes. Because a similar result is true when we are given that B receives the last vote, 
we see from the preceding that 

n m 

Pn,m = : Pn—l,m H j Pn,m —1 (3.14) 

n + m m + n 

We can now prove that P n m = (n — m)/(n + m) by induction on n + m. As it is 
true when n + m — 1, that is, P\ {\ — 1, assume it whenever n + m = k. Then when 
n + m = k + 1, we have by Equation (3.14) and the induction hypothesis that 

n n — 1 — m m n — m + 1 

Pn,m = ; ; ; I ; ; 7 

n + m n — 1 + m m + n n + m — 1 

n — m 
n + m 

and the result is proven. ■ 

The ballot problem has some interesting applications. For example, consider succes¬ 
sive flips of a coin that always land on “heads” with probability p, and let us determine 
the probability distribution of the first time, after beginning, that the total number of 
heads is equal to the total number of tails. The probability that the first time this occurs 
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is at time 2 n can be obtained by first conditioning on the total number of heads in the 
first 2 n trials. This yields 

P{ first time equal = 2n } 



= P {first time equal = 2n\n heads in first 2n\ 


Now, given a total of n heads in the first 2 n flips we can see that all possible orderings of 
the n heads and n tails are equally likely, and thus the preceding conditional probability 
is equivalent to the probability that in an election, in which each candidate receives n 
votes, one of the candidates is always ahead in the counting until the last vote (which 
ties them). But by conditioning on whomever receives the last vote, we see that this is 
just the probability in the ballot problem when m = n — 1. Hence, 



In - 1 


Suppose now that we wanted to determine the probability that the first time there 
are i more heads than tails occurs after the (2 n + / )th flip. Now, in order for this to be 
the case, the following two events must occur: 

(a) The first 2 n + i tosses result in n + i heads and n tails; and 

(b) The order in which the n + i heads and n tails occur is such that the number of 
heads is never i more than the number of tails until after the final flip. 

Now, it is easy to see that event (b) will occur if and only if the order of appearance of 
the n + i heads and n tails is such that starting from the final flip and working backwards 
heads is always in the lead. For instance, if there are 4 heads and 2 tails (n = 2, i = 2), 

then the outcome_77/ would not suffice because there would have been 2 more 

heads than tails sometime before the sixth flip (since the first 4 flips resulted in 2 more 
heads than tails). 

Now, the probability of the event specified in (a) is just the binomial probability of 
getting n + i heads and n tails in 2 n + i flips of the coin. 

We must now determine the conditional probability of the event specified in (b) given 
that there are n + i heads and n tails in the first 2 n + i flips. To do so, note first that 
given that there are a total of n + i heads and n tails in the first 2 n + i flips, all possible 
orderings of these flips are equally likely. As a result, the conditional probability of (b) 
given (a) is just the probability that a random ordering of n + i heads and n tails will, 
when counted in reverse order, always have more heads than tails. Since all reverse 
orderings are also equally likely, it follows from the ballot problem that this conditional 
probability is i /(2 n + ;). 
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That is, we have shown that 


P{a} = 



p n+i (\-p)\ 



and so 


/ J {first time heads leads by i is after flip 2 n + i] — 



Example 3.28 Let U\, Ui .... be a sequence of independent uniform (0, 1) random 
variables, and let 

N — min{« > 2: U n > U n - 1 } 


and 


M = min{n ^ 1: U\ + • • • + U n > 1} 

That is, N is the index of the first uniform random variable that is larger than its 
immediate predecessor, and M is the number of uniform random variables we need 
sum to exceed 1. Surprisingly, N and M have the same probability distribution, and 
their common mean is e ! 

Solution: It is easy to find the distribution of N. Since all n \ possible orderings of 
Ui,... ,U„ are equally likely, we have 

P{N > n] = P{Ui > U 2 > ■■■ > U n } = l/«! 

To show that P{M > n] — 1 //;!, we will use mathematical induction. However, to 
give ourselves a stronger result to use as the induction hypothesis, we will prove the 
stronger result that for 0 < x ^ 1, P{M(x) > 11 } = x n /n\, n Js 1, where 

M(x ) = minjn ^ 1: U\ + -h U„ > x} 

is the minimum number of uniforms that need be summed to exceed x. To prove that 
P{M(x) > n\ = x n /n\, note first that it is true for n = 1 since 


P{M(x ) > 1} = P{Ui ^ x] = x 
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So assume that for all 0 < x ^ 1 ,P{M(x) > n] = x n /n\. To determine P{M(x) > 
n + 1}, condition on U\ to obtain 


P{M(x) >« + !} = 


[ P{M(x ) > n + \\U\ = y] dy 

Jo 

I P{M(x) > n + 11 1 /1 = y] dy 
Jo 

/ P{M(x — y) > n} dy 

Jo 

f x (x - yf 

Jo n - 

f x u" 

i — du 

Jo n! 

1 


dy by the induction hypothesis 


(n + 1)! 


where the third equality of the preceding follows from the fact that given Ui = y, 
M (x) is distributed as 1 plus the number of uniforms that need be summed to exceed 
x — y. Thus, the induction is complete and we have shown that for 0 < x ^ 1, n ^ 1, 


P{M(x ) > n } = x n /n\ 


Letting x = 1 shows that N and M have the same distribution. Finally, we have 

(30 (30 

E[M] = E[N] = ^2P{N > n} = Vl/«! = e ■ 

n =0 n —0 


Example 3.29 Let X\, Xi, ■ ■ ■ be independent continuous random variables with a 
common distribution function F and density / = F', and suppose that they are to be 
observed one at a time in sequence. Let 

N = min{« ^ 2: X n = second largest of Xj, ..., X„} 


and let 


M — min {/z 2: X n = second smallest of X \,..., X n ] 

Which random variable— Xn, the first random variable which when observed is the 
second largest of those that have been seen, or Xm, the first one that on observation is 
the second smallest to have been seen—tends to be larger? 

Solution: To calculate the probability density function of Xxi, it is natural to con¬ 
dition on the value of At; so let us start by determining its probability mass function. 
Now, if we let 

Aj = {Xj ^ second largest of X\ .X, }, i ^ 2 
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then, for n ^ 2, 

P{N = n } = P(A 2 A 3 • ■ ■ A n -iA c n ) 

Since the X; are independent and identically distributed it follows that, for any m p 1, 
knowing the rank ordering of the variables Xi, ..., X m yields no information about 
the set of m values {A" i,..., X m }. That is, for instance, knowing that X] < X 2 gives 
us no information about the values of min(Xi, X 2 ) or max (A" |. X 2 ). It follows from 
this that the events A/, i p 2 are independent. Also, since X, is equally likely to be 
the largest, or the second largest, ..., or the z'th largest of Xi, ..., X, it follows that 
P{A;} = (z — 1)//, i p 2. Therefore, we see that 

12 3 n- 2 1 1 

P{N = n] =-= - 

2 3 4 n — 1 n n{n — 1) 

Hence, conditioning on N yields that the probability density function of Xn is as 
follows: 

oo i 

fx N {x) = E n(n-l) fxN |jv(X|n) 

Now, since the ordering of the variables X 3 ,..., X n is independent of the set of values 
{Xi,..., X„}, it follows that the event {N = n\ is independent of {Xi,..., X n }. 
From this, it follows that the conditional distribution of X^y given that N — n 
is equal to the distribution of the second largest from a set of n random variables 
having distribution F. Thus, using the results of Example 2.38 concerning the density 
function of such a random variable, we obtain 


fx N (x) 


OO 


E 


1 n\ 

n(n - 1) (n — 2)! 1! 


(F(x)) n ~ 2 f(x)( 1 - F(x)) 


= /(x)(l - Fix)) E (*■(*))'' 
1=0 


= fix) 


Thus, rather surprisingly, Xn has the same distribution as Xi, namely, F. Also, if 
we now let W, = —X/, /pi, then Wm will be the value of the first Wj, which 
on observation is the second largest of all those that have been seen. Hence, by the 
preceding, it follows that Wm has the same distribution as Wi. That is, — Xm has 
the same distribution as — X\, and so X m also has distribution F\ In other words, 
whether we stop at the first random variable that is the second largest of all those 
presently observed, or we stop at the first one that is the second smallest of all those 
presently observed, we will end up with a random variable having distribution F. 

Whereas the preceding result is quite surprising, it is a special case of a general 
result known as Ignatov’s theorem , which yields even more surprises. For instance, 
for k ^ 1, let 

Nk = minjn ^ k: X n = kth largest of Xi, ..., X,,} 
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Therefore, Nj is what we previously called N, and X^ k is the first random variable that 
upon observation is the A'th largest of all those observed up to this point. It can then be 
shown by a similar argument as used in the preceding that X has distribution function 
F for all k (see Exercise 82 at the end of this chapter). In addition, it can be shown that 
the random variables Xti k , k ^ 1 are independent. (A statement and proof of Ignatov’s 
theorem in the case of discrete random variables are given in Section 3.6.6.) ■ 

Example 3.30 A population consists of m families. Let Xj denote the size of family 

j, and suppose that X\ . X m are independent random variables having the common 

probability mass function 


OO 

p(k) = P(Xj=k), £>=1 
k= 1 

with mean /x = 'E^kp^, Suppose a member of the population is randomly chosen, in 
that the selection is equally likely to be any of the members of the population, and let 
S, be the event that the selected individual is from a family of size i. Argue that 

P(Si ) —»■ — as m —> oo 
M 

Solution: A heuristic argument for the preceding formula is that because each 
family is of size i with probability p,, it follows that there are approximately mp, 
families of size i when m is large. Thus, imp, members of the population come from 
a family of size i, implying that the probability that the selected individual is from 
a family of size i is approximately ■ 

For a more formal argument, let N, denote the number of families that are of 
size i. That is. 


Ni = number {k : k = 1,..., m : X^ = i] 


Then, conditional on X = (X \, ..., X m ) 
iNi 

P{Si |X)= ' 

l^k= 1 X k 

Hence, 


m-) = £[m-i*)] 

iNi 


= E 


= E 


L E*=t^. 

i Nj /m 


Because each family is independently of size i with probability p, , it follows by the 
strong law of large numbers that Nj /m . the fraction of families that are of size i, 
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converges to p, as m —> oo. Also by the strong law of large numbers, Y^k=i x k/ m 
E[X] = p. as m —> oo. Consequently, with probability 1, 


i Ni/m i pi 

Y!k=\ x k/ m p- 


as m 


oo 


Because the random variable J converges to — so does its expectation, which 

Xk =i im¬ 

proves the result. (While it is now always the case that lim^^oo Y m — c implies that 

liniffl^oo E[Y m ] = c, the implication is true when the Y m are uniformly bounded 
random variables, and the random variables ^i„ Nl v are all between 0 and 1.) ■ 

Li-=1 x k 

The use of conditioning can also result in a more computationally efficient solution 
than a direct calculation. This is illustrated by our next example. 

Example 3.31 Consider n independent trials in which each trial results in one of the 

outcomes 1 ,. .., k with respective probabilities pi,, p k , Y^l= l Pi = 1 - Suppose 
further that n > k, and that we are interested in determining the probability that each 
outcome occurs at least once. If we let A, denote the event that outcome i does not 
occur in any of the n trials, then the desired probability is I — Pi [jf =1 A,), and it can 
be obtained by using the inclusion-exclusion theorem as follows: 


f (u^) = Ew-EEWi) 

\i = 1 / ;=1 i j>i 

+EEE P(AiAjA k ) -■■■ + (— 1/+ 1 B(Ai • ■ ■ A t ) 

i j>i k>j 


where 


P(Ai) = (1 - pi)' 1 
P(AiAj) = (l — Pi — p j) n , i<j 
P(AiAjA k ) = (1 - pi- pj - pk) n , i < j < k 


The difficulty with the preceding solution is that its computation requires the calculation 
of 2 k — 1 terms, each of which is a quantity raised to the power n. The preceding solution 
is thus computationally inefficient when k is large. Let us now see how to make use of 
conditioning to obtain an efficient solution. 

To begin, note that if we start by conditioning on N k (the number of times that 
outcome k occurs) then when N k > 0 the resulting conditional probability will equal 
the probability that all of the outcomes 1 ,... ,k — 1 occur at least once when n — N k 
trials are performed, and each results in outcome i with probability Pi/Y^j=\ Pi > 
i = 1 ,... ,k — 1. We could then use a similar conditioning step on these terms. 

To follow through on the preceding idea, let A m r , for m ^ n, r ^ k, denote the 
event that each of the outcomes 1 ,... ,r occurs at least once when m independent trials 
are performed, where each trial results in one of the outcomes 1 ,,r with respec¬ 
tive probabilities p\jP r , ..., p, / P r , where P r = Y^j=iPj- Let P(m, r) = P(A mr ) 
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and note that Pin, k) is the desired probability. To obtain an expression for P(m. r), 
condition on the number of times that outcome r occurs. This gives 


P(m, r) 


P{A m r \r occurs j times} 
j =0 





m—j 


m—r +1 

X! P{m-j,r-\) 
7=1 






Starting with 

P (m , 1) = 1, if 77i ^ 1 
P{m, 1) = 0, if m — 0 


we can use the preceding recursion to obtain the quantities P{m, 2), m = 2, ..., n — 
(k — 2), and then the quantities P(m,3),m = 3, ... ,n — (k — 3), and so on, up to 
Pirn, k — 1), m = k — 1 , ,n — 1. At this point we can then use the recursion to 
compute P(n, k ). It is not difficult to check that the amount of computation needed is 
a polynomial function of k, which will be much smaller than 2 k when k is large. ■ 

Our next example is concerned with final score probabilities in serve and rally games 
such as table tennis, squash, paddle ball, volleyball, and others. 

Example 3.32 (Serve and Rally Competitions) Consider a serve and rally compe¬ 
tition involving players A and B. Suppose that each rally that begins with a serve by 
player A is won by player A with probability p a and is won by player B with proba¬ 
bility q a = I — p a - Furthermore, suppose that each rally that begins with a serve by 
player B is won by player A with probability pb and is won by player B with proba¬ 
bility qb — 1 — pb- Suppose that the winner of the rally earns a point and becomes the 
server of the next rally. The competition is decided either when A has won a total of 
N points or when B has won a total of M. Given that A serves first, we are interested 
in determining the final score probabilities. 

The format of this example is used in a variety of serve and rally games, including 
international volleyball and American squash, both of which changed from their original 
format which gave service to the winner of the previous rally but only awarded a point 
if the winner of a rally was the server. (See Exercise 84 for an analysis of this latter 
format.) 

Let F denote the final score, with F = ( i, j ) meaning that A won a total of i points 
and B a total of j points. Clearly 

P(F = (A, 0)) = p%, P(F = (0, M)) = q a q™~ X 

To determine the other final score probabilities, imagine that A and B continue to play 
even after the competition is decided. Define the concept of a “round" by letting the 
initial serve of A start the first round and letting a new round begin each time A serves. 
Let Bj denote the number of points won by B in round i. Note that if the first point of 
a round is won by A, then that round ends with B winning 0 points in it. On the other 



130 


Introduction to Probability Models 


hand, if B wins the first point in a round then B will continue serving until A wins a 
point, showing that the number of points won by B in a round is equal to the number 
of times that B serves. Because the number of consecutive serves of B before A wins 
a point is geometric with parameter p b , we see that 


Bi = 


0 , 

Geometric! /?/,), 


with probability p a 
with probability q a 


That is, 


P(Bi = 0 ) = Pa 

P{Bj = k\Bi > 0) = q k b ~ l Pb, k>0 


Because a new round begins each time A wins a point, it follows that B , is the number 
of points that B wins between the time that A has won i — 1 points until A has won i 
points. Consequently, B(n) = Bi * s th e number of points that B has won at the 

moment that A wins its nth point. Noting that the final score will be ( N , m), m < M, 
if B(N) = m, let us determine P(B(n) = m) for m > 0. To do so, we condition on 
the number of B \, ..., B n that are positive. Calling this number Y, that is, 


Y — number of i sC n such that B, > 0 
we obtain 


n 

P(B(n ) = m)=J2 p (B( n ) = m\Y = r)P(Y = r ) 

r =0 
n 

= p ( B ( fl ) = m\Y = r)P(Y = r) 

r= 1 


where the last equality followed since m > (land so P( B(n) = m\Y — 0) = 0.Because 
each ofBi,...,B n is independently positive with probability q a , it follows that Y , the 
number of them that are positive, is binomial with parameters n , q a . Consequently, 

P(B(n) =m) = Y p ( B ( n ) = m \ Y = r ) 

r= 1 ^ ' 


Now, if r of the variables B\,, B n are positive, then Bin) is distributed as the 
sum of r independent geometric random variables with parameter pi,, which is the 
negative binomial distribution of the number of trials until there have been r successes 
when each trial is independently a success with probability p b - Hence, 


p (B(n) = m\Y = r) = ^ p r b q™ r 
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where we are using the convention that (“) = 0 if b > a. This gives 


P(B(n) = m)= j^(™_l)pWr r 

r= 1 ' ' 



q'aP 


n—r 

a 



Thus, we have shown that 
P(F = (N, m)) = P{B(N) = m) 



To determine the probability that the final score will be (n, M ), 0 < n < /V, we condi¬ 
tion on the number of wins that B has at the moment that A wins its mil game to obtain 

OO 

P(F = (n, M)) = Y] p ( F = («. M)\B(n) = m)P{B(n) = m) 

m =0 
M— 1 

= P(F = (n, M)\B{n) = m)P(B(n ) = m) 

m=0 


Now, given that B has m < M points at the moment that A wins its nth point, in 
order for the final score to be (n, M ) B must win the next point with A serving and 
must then win the final M — m — 1 points on its serve. Hence, P(F = (n, M)\B(n ) = 
m) = q a q^- m -\ giving that 


M—\ 


P(F = (n, M)) = Y <1a%- p(B(n ) = m) 


m =0 


M -1 


= q a q^~ l Pa + Y ^r m ~ lp ( B (n) = m) 


m= 1 

M— 1 n 

= q^b l Pa 1 + Y 

m =1 r =1 


m - 1 
r - 1 

0 < n < N 


( Pbqa 
\qbPa 


As noted previously, conditional expectations given that Y = y are exactly the same 
as ordinary expectations except that all probabilities are computed conditional on the 
event that Y — y. As such, conditional expectations satisfy all the properties of ordinary 
expectations. For instance, the analog of 
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E[X] = 


is 


Y^E[X\W = w]P{W = w], 

W 


I 

J w 


E[X\W = w]fw(w) dw, 


if W is discrete 

if W is continuous 


E[X\Y = y] 

E[X\W = w,Y — y]P{W = w\Y = y}, if W is discrete 

_ W 

/ E[X\W = w,Y = y]fw\y{w\y) dw , if W is continuous 

J W 

If E[X\Y, W] is defined to be that function of Y and W that, when Y = v, and W = w, 
is equal to E[X\Y — y, W = w], then the preceding can be written as 

E[X\Y] = E[E[X\Y, W]\Y] 


Example 3.33 An automobile insurance company classifies each of its policyholders 
as being of one of the types i = 1,..., k. It supposes that the numbers of accidents that 
a type i policyholder has in successive years are independent Poisson random variables 
with mean X,,i = \.... ,k. The probability that a newly insured policyholder is type i 
is pi , Y^i=i Pi = Given that a policyholder had n accidents in her first year, what is 
the expected number that she has in her second year? What is the conditional probability 
that she has m accidents in her second year? 


Solution: Let A/ denote the number of accidents the policyholder has in year i, i = 
1, 2. To obtain E[N 2 \Ni = n ], condition on her risk type T. 

k 

E[N 2 \Ni = n]= J2 £ \- N 2\T = j,Ni =n]P{T = j\N 1 = n] 
j =i 


J2 £ [N 2 \T = j]P{T = j\Ni =n) 
7=1 
k 

J2^jP{T = j\N l =n} 

7 = 1 

Ej=t 


L/=t« r '')p .i 

where the final equality used that 

PIT = j, N] = n) 


P{T = j\N 1 =n} = 


P{N i = n) 

P{ N \ = n\T = j}P { T — j] 
T, k j=iP{Ni=n\T = j}P{T = j} 
p j e~ x > X'j /n ! 

ZU Pje- Xj K/n\ 
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The conditional probability that the policyholder has m accidents in year 2 given 
that she had n in year 1 can also be obtained by conditioning on her type. 


P{N 2 = m\Ni = n) = ^ P{N 2 = m\T = j. Ni = n}P{T = j\N\ = n) 
7=1 

k ym 

= Ye-^^-P{T = j\N x = n] 

L —' ml 

j= 1 

_ E*=i e-^JkJ +n Pj 

m\ £*=i e-^k n .pj 

Another way to calculate P{N 2 = m\N\ — n} is hrst to write 


P{N 2 = m\Ni = n } 


P{N 2 — m, N i = n] 
P{N\ = n} 


and then determine both the numerator and denominator by conditioning on T. This 
yields 


P{N 2 — m\N\ = n } 


T!j =i p i N 2 = m ’ Ni — n\T = j}pj 

ZU P ^ Nl ="\T = j} p j 

■\ m t n 

T k , e~ x i ' e~ k J 1 n ■ 

2-J=\ e m\ e n\ Pj 


zU*~ x 4pj 

m\ E * =1 e- l a’l P] 


3.6 Some Applications 

3.6.1 A List Model 

Consider n elements— e\, e 2 ,, e „—that are initially arranged in some ordered list. 
At each unit of time a request is made for one of these elements—e, being requested, 
independently of the past, with probability P,. After being requested the element is then 
moved to the front of the list. That is, for instance, if the present ordering is e \, e 2 , e 2 , ea, 
and c '3 is requested, then the next ordering is e 2 , e\, e 2 , £ 4 . 

We are interested in determining the expected position of the element requested 
after this process has been in operation for a long time. However, before computing 
this expectation, let us note two possible applications of this model. In the hrst we have 
a stack of reference books. At each unit of time a book is randomly selected and is then 
returned to the top of the stack. In the second application we have a computer receiving 
requests for elements stored in its memory. The request probabilities for the elements 
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may not be known, so to reduce the average time it takes the computer to locate the 
element requested (which is proportional to the position of the requested element if 
the computer locates the element by starting at the beginning and then going down the 
list), the computer is programmed to replace the requested element at the beginning of 
the list. 

To compute the expected position of the element requested, we start by conditioning 
on which element is selected. This yields 


E\ position of element requested ] 

n 

= E[ position|e,- is selected ]Pj 

i=i 

n 

= E[ position of e, |e, is selected ]P, 

i=i 

n 

= E[ position of e, ]P,- (3.15) 

(=1 

where the final equality used that the position of e, and the event that e,- is selected are 
independent because, regardless of its position, e; is selected with probability P,. 
Now, 

position of e, = 1 + / ; 

iiti 


where 


Ij = 


1 , if ej precedes e; 
0 , otherwise 


and so, 

E[ position of e; ] = 1 + 

i+i 

= 1 + P{ej precedes e, } 




(3.16) 


To compute P {ej precedes e;}, note that e / will precede e,- if the most recent request for 
either of them was for ej. But given that a request is for either e, or ej, the probability 
that it is for e j is 

P; 

P{ej\e; ore,} = - : - 

w 1 7 Pi + Pj 

and, thus, 

Pj 

P{ej precedes e, } = --- 

XJV P/ + P, 
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Hence, from Equations (3.15) and (3.16) we see that 


n 

E{position of element requested} = 1 + E*E 

i=1 i+i 


Pj 


Pi + Pj 


This list model will be further analyzed in Section 4.8, where we will assume a different 
reordering rule—namely, that the element requested is moved one closer to the front of 
the list as opposed to being moved to the front of the list as assumed here. We will show 
there that the average position of the requested element is less under the one-closer rule 
than it is under the front-of-the-line rule. 


3.6.2 A Random Graph 

A graph consists of a set V of elements called nodes and a set A of pairs of elements 
of V called arcs. A graph can be represented graphically by drawing circles for nodes 
and drawing lines between nodes i and j whenever (i. j) is an arc. For instance if 
V = {1, 2, 3, 4} and A = {(1, 2), (1, 4), (2, 3), (1, 2), (3, 3)}, then we can represent 
this graph as shown in Figure 3.1. Note that the arcs have no direction (a graph in which 
the arcs are ordered pairs of nodes is called a directed graph); and that in the figure 
there are multiple arcs connecting nodes 1 and 2, and a self-arc (called a self-loop) 
from node 3 to itself. 

We say that there exists a path from node i to node j,i ^ j, if there exists a sequence 
of nodes i, i i, ..., 4> j such that ( i , 4), (4, 4), ..., (4, j ) are all arcs. If there is a 

path between each of the (”) distinct pair of nodes we say that the graph is connected. 

The graph in Figure 3.1 is connected but the graph in Figure 3.2 is not. Consider now 

the following graph where V = {1,2,...,«} and A = {(;', i = 1. n} where 

the X (i ) are independent random variables such that 

P{X{i) = ]}=-, j = 1,2.n 

n 

In other words from each node i we select at random one of the n nodes (including 
possibly the node i itself) and then join node ; and the selected node with an arc. Such 
a graph is commonly referred to as a random graph. 



Figure 3.1 A graph. 
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Figure 3.3 


We are interested in determining the probability that the random graph so obtained 
is connected. As a prelude, starting at some node—say, node 1—let us follow the 
sequence of nodes, 1, X(l), X 2 (l), ..., where X"(l) = X(X" -1 (1)); and define N to 
equal the first k such that X k (1) is not a new node. In other words, 

N = 1st k such that X k (l) e {1, X(l),..., X A_1 (1)} 

We can represent this as shown in Figure 3.3 where the arc from X N ~ l (1) goes back 
to a node previously visited. 

To obtain the probability that the graph is connected we first condition on N to 
obtain 

n 

P{ graph is connected} = Fjconnectedl N = k}P{N = k} (3.17) 

k= 1 


Now, given that N — k, the k nodes 1, X(\), ..., _ 1 (1) are connected to each other, 

and there are no other arcs emanating out of these nodes. In other words, if we regard 
these k nodes as being one supernode, the situation is the same as if we had one 
supernode and n — k ordinary nodes with arcs emanating from the ordinary nodes— 
each arc going into the supernode with probability k/n. The solution in this situation 
is obtained from Lemma 3. 1 by taking r = n — k. 

Lemma 3.1 Given a random graph consisting of nodes 0, l,... ,r and r arcs — 
namely, ( i,Y,),i = 1.r, where 


Yi = 


i 


o 


l 

with probability-, 

r + k 

k 

with probability- 

r + k 


7 = 1. 


then 


k 

r + k 


P{ graph is connected} = 
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(In other words, for the preceding graph there are r + 1 nodes— r ordinary nodes 
and one supernode. Out of each ordinary node an arc is chosen. The arc goes to the 
supernode with probability k/(r + k) and to each of the ordinary ones with probability 
1 /(r + k). There is no arc emanating out of the supernode.) 

Proof. The proof is by induction on r. As it is true when r = 1 for any k, assume it 
true for all values less than r. Now, in the case under consideration, let us first condition 
on the number of arcs ( /, Yj) for which Yj = 0. This yields 

P{ connected) 

= £p{connected|ioftheT; = 0} Q (^) (^) (2U8) 

Now, given that exactly i of the arcs are into the supernode (see Figure 3.4), the situation 
for the remaining r — i arcs which do not go into the supernode is the same as if we 
had r — i ordinary nodes and one supernode with an arc going out of each of the 
ordinary nodes—into the supernode with probability i /r and into each ordinary node 
with probability 1/r. But by the induction hypothesis the probability that this would 
lead to a connected graph is i/r. 

Hence, 


P{connected|( of the Yj = 0} = 


and from Equation (3.18) 

/ J { connected} = ^ ) 


i=0 

1 

= -E 
r 


binomial I r. 


r+kj \r + k 
k 


r + k 


r + k 

which completes the proof of the lemma. 



Figure 3.4 The situation given that i of the r arcs are into the supernode. 
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Hence, as the situation given N = k is exactly as described by Lemma 3.1 when 
r = n — k, we see that, for the original graph, 

k 

Pfgraph is connected|iV = k] = — 

n 


and, from Equation (3.17), 

P{ graph is connected} = 


E(N) 


(3.19) 


To compute E (N) we use the identity 

OO 

E(N) = Y J PW>i} 

i =1 

which can be proved by defining indicator variables /,■, i ^ 1, by 

h = 


1, if / ^ TV 
0, if i > N 


Hence, 


N = J2 J i 


i =1 


and so 


E(N) = E 


E 7 ' 

i=l . 

= e™ 

1=1 
OO 

= ^ 


! = 1 

Now, the event {N ^ i } occurs if the nodes 1, Z(l), 
Hence, 


P{N > i } = 


(n — 1) (n — 2) (« — i + 1) 

n n n 

(n~ D! 


(« — i)ln‘ 1 

and so, from Equations (3.19) and (3.20), 

A 1 

P{ graph is connected} = (n — 1)! > -r 

L —' (n — i)\n l 
(=1 


(3.20) 


X' 1 (1) are all distinct. 
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We can also use Equation (3.21) to obtain a simple approximate expression for the 
probability that the graph is connected when n is large. To do so, we first note that if 
X is a Poisson random variable with mean n, then 


n — 1 



Since a Poisson random variable with mean n can be regarded as being the sum of n 
independent Poisson random variables each with mean 1, it follows from the central 
limit theorem that for n large such a random variable has approximately a normal 
distribution and as such has probability j of being less than its mean. That is, for n 
large. 


P{X < n] i 


and so for n large, 


n — 1 



Hence, from Equation (3.21), for n large. 


2 n" 

By employing an approximation due to Stirling that states that for n large, 
n\ n n+l ^ 2 e~"V2 tt 


/ J { graph is connected} « 


we see that, for n large. 



and as 



we see that, for n large, 


P {graph is connected} 



Now a graph is said to consist of r connected components if its nodes can be 
partitioned into r subsets so that each of the subsets is connected and there are no arcs 
between nodes in different subsets. For instance, the graph in Figure 3.5 consists of 
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Figure 3.5 A graph having three connected components. 


three connected components—namely, {1, 2, 3}, {4, 5}, and {6}. Let C denote the 
number of connected components of our random graph and let 

P n (i) = P{C = i] 


where we use the notation P n (i) to make explicit the dependence on n, the number 
of nodes. Since a connected graph is by definition a graph consisting of exactly one 
component, from Equation (3.21) we have 


P„(1) = P{C= 1} 


(n - D! 

n" 


n —1 

£7 


.7=0 


]'■ 


(3.22) 


To obtain P n { 2), the probability of exactly two components, let us first fix attention on 
some particular node—say, node 1. In order that a given set of k — 1 other nodes—say, 
nodes 2, ..., k —will along with node 1 constitute one connected component, and the 
remaining n — k a second connected component, we must have 


(i) X(i) e {1, 2,...,£}, for all i = 1, ..., k. 

(ii) X(i) e {k + 1,... ,«}, for all i = k + 1, ..., n. 

(iii) The nodes 1,2, .. ., k form a connected subgraph. 

(iv) The nodes k + 1,..., n form a connected subgraph. 


The probability of the preceding occurring is clearly 


n — k 


n—k 


p k (\)p n - k {\) 


and because there are (” ’) ways of choosing a set of k — 1 nodes from the nodes 2 
through n , we have 



n — k 


n—k 


p k {\)p n - k (\) 


and so P n ( 2) can be computed from Equation (3.22). In general, the recursive formula 
for P n (i ) is given by 


n—i+l 

Pn(i)= 

k= 1 


n — 1 
k- 1 


n—k 


P k (l)P n - k (i - 1) 
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Figure 3.6 A cycle. 

To compute E[C], the expected number of connected components, first note that every 
connected component of our random graph must contain exactly one cycle (a cycle 
is a set of arcs of the form ( i , i i), (*'i, ij), ..., (ik- 1 , ik), (4, i) for distinct nodes 
i, i i,, ik)- For example, Figure 3.6 depicts a cycle. 

The fact that every connected component of our random graph must contain exactly 
one cycle is most easily proved by noting that if the connected component consists of r 
nodes, then it must also have r arcs and, hence, must contain exactly one cycle (why?). 
Thus, we see that 

E[C] = £ [number of cycles] 


= E £/(S) 


L s 


= £>[/($)] 


s 


where the sum is over all subsets S C {1,2. n} and 


1, if the nodes in S are all the nodes of a cycle 
0, otherwise 


Now, if S consists of k nodes, say 1, ..., k, then 


£[/(5)] = P{1, ATI), ..., X k J (l) are all distinct and contained in 



Hence, because there are (") subsets of size k we see that 



3.6.3 Uniform Priors, Polya's Urn Model, and Bose—Einstein Statistics 

Suppose that n independent trials, each of which is a success with probability p, are 
performed. If we let X denote the total number of successes, then X is a binomial 
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random variable such that 



However, let us now suppose that whereas the trials all have the same success probabil¬ 
ity p, its value is not predetermined but is chosen according to a uniform distribution on 
(0, 1). (For instance, a coin may be chosen at random from a huge bin of coins repre¬ 
senting a uniform spread over all possible values of p, the coin’s probability of coming 
up heads. The chosen coin is then flipped n times.) In this case, by conditioning on the 
actual value of p, we have 



Now, it can be shown that 



(3.23) 


and thus 



1 


-, k = 0. 1. n 

n + 1 


(3.24) 


In other words, each of the n + 1 possible values of X is equally likely. 

As an alternate way of describing the preceding experiment, let us compute the 
conditional probability that the (r + l)st trial will result in a success given a total of k 
successes (and r — k failures) in the first r trials. 

P{(r + l)st trial is a success|k successes in first r} 

P{(r + l)st is a success, k successes in first r trials} 


P{k successes in first r trials} 

f 1 

Jo P{( r + l) st ls a success, k in first r\p] dp 


l/(f + 1) 




by Equation (3.23) 


k+ 1 


(3.25) 


r + 2 
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That is, if the first r trials result in k successes, then the next trial will be a success with 
probability (k + l)/(r + 2). 

It follows from Equation (3.25) that an alternative description of the stochastic 
process of the successive outcomes of the trials can be described as follows: There 
is an urn that initially contains one white and one black ball. At each stage a ball is 
randomly drawn and is then replaced along with another ball of the same color. Thus, 
for instance, if of the first r balls drawn, k were white, then the urn at the time of the 
(r + l)th draw would consist of k + 1 white and r — k + 1 black, and thus the next ball 
would be white with probability (k + I )/(r + 2). If we identify the drawing of a white 
ball with a successful trial, then we see that this yields an alternate description of the 
original model. This latter urn model is called Polya’s urn model. 

Remarks 

(i) In the special case when k — r, Equation (3.25) is sometimes called Laplace’s 
rule of succession, after the French mathematician Pierre de Laplace. In Laplace’s 
era, this “rule” provoked much controversy, for people attempted to employ it in 
diverse situations where its validity was questionable. For instance, it was used to 
justify such propositions as “If you have dined twice at a restaurant and both meals 
were good, then the next meal also will be good with probability and “Since 
the sun has risen the past 1,826,213 days, so will it rise tomorrow with probability 
1,826,214/1,826,215.” The trouble with such claims resides in the fact that it is 
not at all clear the situation they are describing can be modeled as consisting of 
independent trials having a common probability of success that is itself uniformly 
chosen. 

(ii) In the original description of the experiment, we referred to the successive trials as 
being independent, and in fact they are independent when the success probability is 
known. However, when p is regarded as a random variable, the successive outcomes 
are no longer independent because knowing whether an outcome is a success or 
not gives us some information about p, which in turn yields information about the 
other outcomes. 

The preceding can be generalized to situations in which each trial has more than 
two possible outcomes. Suppose that n independent trials, each resulting in one of m 
possible outcomes 1, ■ • •, m, with respective probabilities pi, ..., p m are performed. If 
we let X, denote the number of type; outcomes that result in the n trials, i = 1 , ,m, 

then the vector X\,, X m will have the multinomial distribution given by 

P{X l =Xi,X 2 = x 2 ,..., X m = x m \p) = . pTp? ‘'' Pm" 

X l! • • • X m ! 

where x\,...,x m is any vector of nonnegative integers that sum to n. Now let us 
suppose that the vector p = (p\,..., p m ) is not specified, but instead is chosen by a 
“uniform” distribution. Such a distribution would be of the form 


f(p l, • • •. Pm) = 


c, 0 < pi < 1, i = 1, ..., m , Yl\ Pi = 1 
0, otherwise 
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The preceding multivariate distribution is a special case of what is known as the Dirich- 
let distribution , and it is not difficult to show, using the fact that the distribution must 
integrate to 1, that c = (m — 1)!. 

The unconditional distribution of the vector X is given by 


P{X\ — x \,..., X m — x m } — 

X/(pi,..., p m ) dpi ■ ■ ■ dp m = 


— Xm } — J'j ' ' * J — X\ , . . . , X m — Xm | pi, , p m } 

(m — l)!n! J J j 


x\l 


i... 


P? Pm"dp 1 ■ ■ ■ dp n 


0<p,<l 

E“«=i 


Now it can be shown that 

// /■” p - dp '' '' dp ~ = (Ert+»■ -1)! 


and thus, using the fact that = «. we see that 


P{Xt — xi,..., X m — x m } — 


n\{m — 1)! 

(n + m — 1 )! 


( n + m — l\ 
m — 1 J 


-l 


(3.26) 


(3.27) 


Hence, all of the jpossible outcomes (there are j *) possible nonnegative 

integer valued solutions of x\ + ■ —b x m = n ) of the vector (X \,..., X m ) are equally 
likely. The probability distribution given by Equation (3.27) is sometimes called the 
Bose-Einstein distribution. 

To obtain an alternative description of the foregoing, let us compute the conditional 
probability that the (n + l)st outcome is of type j if the first n trials have resulted in 
Xj type i outcomes, i = \..... m, Y^’i x i = n - This is given by 


P{(n + l)st is j\xi type i in first n,i = 1, ..., m] 

P{(n + l)st is j , xi type i in first n,i = 1 ,... ,m} 


P{xi type i in first n,i — 1 ,,m] 
n\(m — 1)! 


Xi! 


I . . . 


W-ff-'T 


Pm d Pt ' ' ' dpn 


n + m — 1 
m — 1 


-l 
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where the numerator is obtained by conditioning on the p vector and the denominator 
is obtained by using Equation (3.27). By Equation (3.26), we have 

P{(n + l)st is j\xi type i in first n, i = 1,..., m} 

(x j + 1 )n\(m — 1)! 

(n + m)\ 

(m — l)!n! 

(« + m — 1)! 


n + m 

Using Equation (3.28), we can now present an urn model description of the stochastic 
process of successive outcomes. Namely, consider an urn that initially contains one of 
each of m types of balls. Balls are then randomly drawn and are replaced along with 
another of the same type. Hence, if in the first n drawings there have been a total of 
Xj type j balls drawn, then the urn immediately before the (n + l)st draw will contain 
xj + 1 type j balls out of a total of m + n, and so the probability of a type j on the 
(n + l)st draw will be given by Equation (3.28). 

Remark Consider a situation where n particles are to be distributed at random among 
m possible regions; and suppose that the regions appear, at least before the experiment, 
to have the same physical characteristics. It would thus seem that the most likely dis¬ 
tribution for the number of particles that fall into each of the regions is the multinomial 
distribution with /?; = 1/m. (This, of course, would correspond to each particle, inde¬ 
pendent of the others, being equally likely to fall in any of the m regions.) Physicists 
studying how particles distribute themselves observed the behavior of such particles 
as photons and atoms containing an even number of elementary particles. However, 
when they studied the resulting data, they were amazed to discover that the observed 
frequencies did not follow the multinomial distribution but rather seemed to follow 
the Bose-Einstein distribution. They were amazed because they could not imagine a 
physical model for the distribution of particles that would result in all possible out¬ 
comes being equally likely. (For instance, if 10 particles are to distribute themselves 
between two regions, it hardly seems reasonable that it is just as likely that both regions 
will contain 5 particles as it is that all 10 will fall in region 1 or that all 10 will fall in 
region 2.) 

However, from the results of this section we now have a better understanding of the 
cause of the physicists’ dilemma. In fact, two possible hypotheses present themselves. 
First, it may be that the data gathered by the physicists were actually obtained under 
a variety of different situations, each having its own characteristic p vector that gave 
rise to a uniform spread over all possible p vectors. A second possibility (suggested by 
the urn model interpretation) is that the particles select their regions sequentially and a 
given particle’s probability of falling in a region is roughly proportional to the fraction 
of the landed particles that are in that region. (In other words, the particles presently in 
a region provide an “attractive” force on elements that have not yet landed.) 
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3.6.4 Mean Time for Patterns 

Let X = (X \, X 2 , ...) be a sequence of independent and identically distributed discrete 
random variables such that 


Pi = P{Xj = 1} 

For a given subsequence, or pattern, i \,..., i n let T = T(i\, , i n ) denote the number 

of random variables that we need to observe until the pattern appears. For instance, if the 
subsequence of interest is 3, 5, 1 and the sequence is X = (5, 3, 1, 3, 5, 3, 5, 1, 6, 2, ...) 
then T = 8. We want to determine E[T], 

To begin, let us consider whether the pattern has an overlap, where we say that the 
pattern ii, 12 , ..., in has an overlap if for some k, 1 ^ k < n, the sequence of its final 
k elements is the same as that of its first k elements. That is, it has an overlap if for 
some 1 ^ k < n, 

Un—k+l. • • ■ ’ in) = O' I, - • • , i k) ■ k < tl 

For instance, the pattern 3, 5, 1 has no overlaps, whereas the pattern 3, 3, 3 does. 

Case 1 The pattern has no overlaps. 

In this case we will argue that T will equal j + n if and only if the pattern does not 
occur within the first j values, and the next n values are i \, 

That is, 


T = j + n & {T > j, (X j+ i, X j+n ) = i n )} (3.29) 

To verify (3.29), note first that T = j + n clearly implies both that T > j and that 
(Xj+ 1 , ..., Xj+ n ) — (/],..., i n ). On the other hand, suppose that 

T > j and (X j+1 ,...,X j+n ) = (i l ,...,i n ) (3.30) 

Let k < n. Because ( 11 ,..., 4) ^ (i n -k+ 1 , ..., i n ), it follows that T ^ j + k. But 
(3.30) implies that T ^ j + n, so we can conclude that T = j + n. Thus we have 
verified (3.29). 

Using (3.29), we see that 

P{T = j + n] = PIT > j, (Xj+ 1 , . . . , X j+n ) = (n . in)} 

However, whether T > j is determined by the values X\X j, and is thus inde¬ 
pendent of Xj+i,... ,Xj+ n . Consequently, 


P[T = j + n] = P{T > j]P{(X j+ 1 ,..., Xj+n) = in)} 

= P{T > j}p 


where 


P = Pi 1 Ph ■ ■ ■ Pi, 
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Summing both sides of the preceding over all j yields 

OO OO 

1 = Y J P{T = j + n) = pJ2 P ^ T > = P E ^ 

.7=0 7=0 

or 

E[T] = - 
P 


Case 2 The pattern has overlaps. 

For patterns having overlaps there is a simple trick that will enable us to obtain E[T] 
by making use of the result for nonoverlapping patterns. To make the analysis more 
transparent, consider a specific pattern, say P = (3, 5, 1, 3, 5). Let x be a value that 
does not appear in the pattern, and let T x denote the time until the pattern P v = 
(3, 5, 1,3, 5, x) appears. That is, T x is the time of occurrence of the new pattern that 
puts x at the end of the original pattern. Because x did not appear in the original pattern 
it follows that the new pattern has no overlaps; thus, 

E[T X ] = — 

PxP 

where p = YVj=\ Pi, = P 3 P 5 P l- Because the new pattern can occur only after the 
original one, write 

T x = T + A 


where T is the time at which the pattern P = (3, 5, 1, 3, 5) occurs, and A is the addi¬ 
tionaltime after the occurrence of the pattern P until P v occurs. Also, let E[T x \i \, ... i r ] 
denote the expected additional time after time r until the pattern P v appears given that 
the first r data values are i\, ... ,i r . Conditioning on X, the next data value after the 
occurrence of the pattern (3, 5, 1,3, 5), gives 


E[A\X = 7 ] 


Therefore, 


1 + £17*13,5, 1], if 7 = 1 

1 + £17*13], if 7=3 

1, if 7 = x 

1 + E[T X ], if 7^1,3,* 


E[T X ] = E[T] + E[A] 

= E[T] + 1 + E[T X |3, 5, 1 ]pi + E[T x \3 ] P3 

+E[T X ](1 - pi - p 3 - p x ) (3.31) 


But 


E[T X ] = E[T(3, 5, 1)] + E[T X \3, 5, 1] 
giving 

E[T X \3, 5, 1] = E[T X ] — £|T(3, 5, 1)] 
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Similarly, 

E[T X \3] = E[T X ]~ E[T(3)] 

Substituting back into Equation (3.31) gives 

PxE[T x ] = E[T] + 1 - P iE[T( 3, 5, 1)] - p 3 E[T( 3)] 
But, by the result in the nonoverlapping case, 

E[T{ 3,5,1)]=---, E[T{ 3)]= — 

P3P5P1 P 3 

yielding the result 


E[T] = p x E[T x ] + — = - + — 
P3P5 P P3P5 


For another illustration of the technique, let us reconsider Example 3.15, which is 
concerned with finding the expected time until n consecutive successes occur in inde¬ 
pendent Bernoulli trials. That is, we want E[T\, when the pattern is P = (1, 1,..., 1). 
Then, with x / 1 we consider the nonoverlapping pattern P A = (1,..., I. x ), and let 
T x be its occurrence time. With A and X as previously defined, we have 


E[A\X = i] 


1 + E[A], if i = 1 
1 , if i = x 

1 + E[T X ], if i 961 ,* 


Therefore, 

E[A] = 1 + E[A]p\ + E[T X ]( 1 - pi - p x ) 


or 


1 i — Pi — Px 

E[A] = - -+ E[T X ]—- — 

l- pi 1 - pi 

Consequently, 

E[T] = E[T X ] - E[A] 

= PxE[T x ] - 1 

1 - pi 

= (I/Pi)" ~ 1 

1 - pi 


where the final equality used that E[T X ] 


l 

PlPx ' 
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The mean occurrence time of any overlapping pattern P = (i[,... ,i n ) can be 
obtained by the preceding method. Namely, let T x be the time until the nonoverlapping 
pattern P x = (i\,... ,i n , x ) occurs; then use the identity 

E[T X ] = E[T] + E[A] 

to relate E[T] and E[T X ] = then condition on the next data value after P occurs 
to obtain an expression for E[ A\ in terms of quantities of the form 

E[T x \h, ...,i r ] = E[T X ] - E[T(i u .... i r )] 

If (i i , .... i r ) is nonoverlapping, use the nonoverlapping result to obtain 
E[T (i i, ..., i r )]; otherwise, repeat the process on the subpattern (i'i, ..., i r ). 

Remark We can utilize the preceding technique even when the pattern i \, ...,/„ 
includes all the distinct data values. For instance, in coin tossing the pattern of interest 
might be h. t, h. Even in such cases, we should let x be a data value that is not in the 
pattern and use the preceding technique (even though p x = 0). Because p x will appear 
only in the final answer in the expression p x E[T x ] = -jpj,- by interpreting this fraction 
as 1 Ip we obtain the correct answer. (A rigorous approach, yielding the same result, 
would be to reduce one of the positive /;, by e, take p x — e, solve for E[T\, and then 
let 6 go to 0 .) ■ 

3.6.5 The k-Record Values of Discrete Random Variables 

Let X[, X 2 , ■ ■ ■ be independent and identically distributed random variables whose 
set of possible values is the positive integers, and let P{X = j}, j A 1, denote their 
common probability mass function. Suppose that these random variables are observed 
in sequence, and say that X n is a k-record value if 

Xj Js X n for exactly k of the values i, i = l,.... n 

That is, the nth value in the sequence is a fc-record value if exactly k of the first n values 
(including X n ) are at least as large as it. Let R/ denote the ordered set of A:-record 
values. 

It is a rather surprising result that not only do the sequences of A:-record values have 
the same probability distributions for all k, these sequences are also independent of 
each other. This result is known as Ignatov’s theorem. 

Theorem 3.1 (Ignatov’s Theorem) R^, k ^ 1, are independent and identically dis¬ 
tributed random vectors. 

Proof. Define a series of subsequences of the data sequence X\. A4, ... by letting 
the /th subsequence consist of all data values that are at least as large as i, i ^ 1. For 
instance, if the data sequence is 


2, 5, 1, 6, 9, 8, 3,4, 1,5, 7, 8, 2, 1, 3, 4, 2, 5, 6, 1,... 
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then the subsequences are as follows: 

^ 1: 2, 5, 1, 6, 9, 8, 3,4, 1,5, 7, 8, 2, 1, 3, 4, 2, 5, 6, 1,... 

^ 2: 2, 5, 6, 9, 8, 3,4, 5, 7, 8, 2, 3, 4, 2, 5, 6,... 

^ 3: 5,6,9, 8, 3,4, 5,7, 8, 3,4, 5,6,... 

and so on. 

Let X'j be the jth element of subsequence i. That is, X'- is the /'th data value that is 
at least as large as i . An important observation is that i is a &-record value if and only 
if X\ — i. That is, i will be a A:-record value if and only if the A'th value to be at least 

as large as i is equal to i. (For instance, for the preceding data, since the fifth value to 

be at least as large as 3 is equal to 3 it follows that 3 is a five-record value.) Now, it is 
not difficult to see that, independent of which values in the first subsequence are equal 
to 1, the values in the second subsequence are independent and identically distributed 
according to the mass function 

P {value in second subsequence = j} = P{X = j\X X 2}, j X 2 

Similarly, independent of which values in the first subsequence are equal to 1 and which 
values in the second subsequence are equal to 2, the values in the third subsequence 
are independent and identically distributed according to the mass function 

/’{value in third subsequence = j ) = P{X — j\ X X 3), j X 3 

and so on. It therefore follows that the events { X '. = i},i X h j X 1, are independent 
and 

P{i is aA:-record value} = P{X' k — i) = P{X = i\X X i} 

It now follows from the independence of the events {X[ = i},i X 1, and the fact that 
P[i is a A:-record value} does not depend on k, that R/ has the same distribution for 
all k X 1. In addition, it follows from the independence of the events {A} = 1}, that 
the random vectors R*, k X 1, are also independent. ■ 

Suppose now that the Xj ,iX 1 are independent finite-valued random variables with 
probability mass function 

pi = P{X = i}, i = 1,..., m 

and let 


T = min{« : A, Js X n for exactly k of the values i,i — 1 ,,n} 

denote the first k-record index. We will now determine its mean. 
Proposition 3.3 Let answer this, = pi/ Y^'J=i Pi > * = 1, • •., m. Then 

m— 1 

E[T] = k + (k-l)J2 x i 
/=! 
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Proof. To begin, suppose that the observed random variables Xi, X 2 , ... take on one 
of the values i,i + \,... ,m with respective probabilities 


P{X = j) 


Pj 


Pi 


Pm 


i, . . . , m 


Let Tj denote the first /.'-record index when the observed data have the preceding mass 
function, and note that since the each data value is at least i it follows that the /.'-record 
value will equal i, and 7) will equal k, if X k = i. Asa result, 


E[Ti\X k =i] = k 


On the other hand, if X k > i then the A:-record value will exceed i, and so all data 
values equal to i can be disregarded when searching for the /c-record value. In addition, 
since each data value greater than i will have probability mass function 

P{X = j\X>i}= -^-, j = i + 1, ...,m 

Pi +1 + ' ' ' + Pm 

it follows that the total number of data values greater than i that need be observed until 
a /:-record value appears has the same distribution as 7} + i. Hence, 


E[Ti\X k > i] = E[T i+l + Ni\X k > i] 


where Tj + [ is the total number of variables greater than i that we need observe to obtain 
a Z:-record, and /V, is the number of values equal to i that are observed in that time. 
Now, given that X k > i and that 7’, + i = n (n Js k) it follows that the time to observe 
7}+i values greater than i has the same distribution as the number of trials to obtain n 
successes given that trial k is a success and that each trial is independently a success 
with probability 1 — pt/ YL t >i Pi = ' ~ 7-,. Thus, since the number of trials needed 
to obtain a success is a geometric random variable with mean 1/(1 — A.,-), we see that 


E[T t | T i+1 ,X k >i]=l + 


Ti+ 1 — 1 _ Ti + \ - Xi 
1 - Xi ~ 1 - Xi 


Taking expectations gives 


E[Ti\X k > i] = E 


Ti+l — 7.; 


1 - Xi 


X k > i 


E[T i+ 1 ] - Xj 

1 - Xj 


Thus, upon conditioning on whether X k = i, we obtain 


E[Tt] = E[Ti\X k = i]X, + E[Ti\X k > i](l - A;) 
= (k- 1)A+ E[T i+ 1 ] 
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Starting with E[T m \ = k, the preceding gives 

E[T m -i] = (k — l)A m _i +k 

E[T m - 2 ] = (k — + (k — + k 

m —1 

= (k - 1 ) J2 X J+ k 

j=m—2 

m —1 

E[T m - 3 ] = (k - l)A m _ 3 + (k - 1) X J+ k 

j=m— 2 

m— 1 

= (£ - 1 ) J2 x j+ k 

j=m -3 

In general, 

m— 1 

£[7/] = (A: - 1) £ A.; + k 
j—i 

and the result follows since T = T\. ■ 

3.6.6 Left Skip Free Random Walks 

Let Xj, i ^ 1 be independent and identically distributed random variables. Let Pj — 
P(Xj = j) and suppose that 

OO 

j=- 1 

That is, the possible values of the X, are —1,0,1 .If we take 

n 

So = 0, S n = x i 
1=1 

then the sequence of random variables S „, n ^ 0 is called a left skip free random walk. 
(It is called left skip free because S n can decrease from S „_ 1 by at most 1.) 

For an application consider a gambler who makes a sequence of identical bets, for 
which he can lose at most 1 on each bet. Then if X, represents the gambler’s winnings 
on bet;, then S n would represent his total winnings after the first n bets. 

Suppose that the gambler is playing in an unfair game, in the sense that E[X{\ < 0, 
and let v = — E[Xi], Also, let To = 0, and for k > 0, let T-k denote the number of 
bets until the gambler is losing k. That is, 


T-k — minjn : S n = —k } 


It should be noted that T-k < 00 ; that is, the random walk will eventually hit — k. This 
is so because, by the strong law of large numbers, S„/n -* E[Xj] < 0, which implies 
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that S n —> — oo. We are interested in determining 7s [71*] and Var(r_^.). (It can be 
shown that both are finite when E[Xj] < 0.) 

The key to the analysis is to note that the number of bets until one’s fortune decreases 
by k can be expressed as the number of bets until it decreases by 1 (namely, T- \), plus 
the additional number of bets after the decrease is 1 until the total decrease is 2 (namely, 
T _2 — T_ i), plus the additional number of bets after the decrease is 2 until it is 3 (namely, 
T- 3 — T- 2 ), and so on. That is, 

k 

T-k = £-! + £ ( T -J - T ~U-») 
j =2 

However, because the results of all bets are independent and identically distributed, it 
follows that T- 1 , T-2 — T- 1 , 71 3 — T- 2 ,..., T-k — T-^k-i) are all independent and 
identically distributed. (That is, starting at any instant, the number of additional bets 
until the gambler’s fortune is one less than it is at that instant is independent of prior 
results and has the same distribution as 71 1 .) Consequently, the mean and variance of 
T-k , the sum of these k random variables, are 

E[T- k ] = kE[T- 1 ] 

and 


Var(T-k) = kVar(71i) 

We now compute the mean and variance of 71 1 by conditioning on X\ , the result 
of the initial bet. Now, given Xi, 71 1 is equal to 1 plus the number of bets it takes 
until the gambler’s fortune decreases by X[ + 1 from what it is after the initial bet. 
Consequently, given Xi, T-\ has the same distribution as 1 + 71( Zl +i). Hence, 

E[T-\\X\] = 1 + E[T- (Xl+l) ] = 1 + (Xi + 1) J E[71 1 ] 

Var(71i|X 1 ) = Var(71 (Zl+1) ) = (X! + l)Var(W_,) 

Consequently, 

E[T-i] = E [£[T_i|Xi]] = 1 + (-v + l)£[T_i] 
or 

£[71 1 ] = 1 
v 

which shows that 

E[T - k ] = - (3.32) 

v 

Similarly, with a 2 = Vard 1 ), the conditional variance formula yields 

Var(71i) = E[(X 1 + l)Var(71i)] + Var(Xi £[7’_ 1 ]) 

= (1 - u)Var(7l!) + (£[7’_ 1 ]) 2 a 2 
a 2 

= (1 - u)Var(7’_ 1 ) + — 
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thus showing that 


VarCr_!) = %■ 

v i 

and yielding the result 
ka 2 

Var (T- k ) = (3.33) 

There are many interesting results about skip free random walks. For instance, the 
hitting time theorem. 

Proposition 3.4 (The Hitting Time Theorem) 

k 

P(T-k = n) = —P(S„ = —k ), n ^ 1 
n 

Proof. The proof is by induction on n. Now, when n = I we must prove 


P(T- k = 1) = kP(Si = -k) 


However, the preceding is true when k — 1 because 


P(T-1 = 1 ) = P(Si = — 1 ) = P_i 


and it is true when k > 1 because 


P(T- k = 1) = 0 = P(S i = -k), k > 1 

Thus the result is true when n = 1. So assume that for a fixed value n > 1 and all 
k > 0 


P(T_ k = n - 1) = JL-P(S n -1 = —k) (3.34) 

n — 1 

Now consider P(T~k = n). Conditioning on X i yields 

OO 

P(T- k = n)= J 2 P(P-k = n\X\ = j)Pj 
j =-1 

Now, if the gambler wins j on his initial bet, then the first time that he is down k will 
occur after bet n if the first time that his cumulative losses after the initial gamble is 
k + j occurs after an additional n — 1 bets. That is, 


P(T-k = n\Xi = j ) = P(T-(k +j) = n - 1) 
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Consequently, 

OO 

P(T- k = n)= p ( T ~k = n \ x l = J) P J 
)=-1 
OO 

= E p ( T -«+i) = n - v p i 

j =-1 

OO ^ I • 

= E — ^p{s n -i = -(.k+j)}Pj 
' n — 1 
j=-i 

where the last equality follows by Induction Hypothesis (3.34). Using that 
P(S n = -k\Xi = j ) = P{S n -1 = -(k + j)} 
the preceding yields 


P(T_ k =n)= E = -*|*t = ;)/>/ 

z —' n — 1 


y=-i 

OO 


E h / 

—= -*,*! = 7) 

n — i 

./=-! 

o° k , • 

= E -= -k)P(S„ = -k ) 

' n — 1 
./=-! 

= /’(Sn = -*) 


-y E P(Xi= j\Sn = -k) 


j =-1 


y^y E J P (X\ = j\S n = —k) 


= PiSn = -k ) 


n — 1 n — 1 


£[Xi|S„ = -*] 


(3.35) 


However, 


-* = £[S„|S b = -A:] 

= E[Xx + ■ ■ ■ + X n \S n = -k] 

n 

= J2 E [Xi\S n = -k] 

1 = 1 

= n£[X x |S„ = —k] 

where the final equation follows because X\, ... ,X n are independent and identically 
distributed and thus the distribution of Xj given that X i + ■ ■ ■ + X n = —k is the same 
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for all i. Hence, 

= -*] = -- 
n 

Substituting the preceding into (3.35) gives 

k 

= ~P(S n = —k) 

n 


P(T- k = n)= P(S n = -k ) 


1 k 


n — 1 n — 1 n 


and completes the proof. ■ 

Suppose that after n bets the gambler is down k. Then the conditional probability 
that this is the first time he has ever been down k is 


P(T- k = n\S n = -k) = 


P(T- k = n, S n = -k) 
P(S n = -k) 


P(T- k = n) 
P(S n = -k) 


= - (by the hitting time theorem) 
n 


Let us suppose for the remainder of this section that —v = E[X] < 0. Combining 
the hitting time theorem with our previously derived result about E[T-k] gives the 
following: 

k 

- = E[T- k ] 
v 

OO 

= Y.nP(T-k = n) 
n= 1 
co 

= J2 kP(Sn = ~ k) 

n =1 

where the final equality used the hitting time theorem. Hence, 

oo 1 

Y p(s n = -*) = - 

, v 

n =1 

Let I„ be an indicator variable for the event that S n = —k. That is, let 

Jl, if S„ = -k 
ln _ jO, if S n ^-k 

and note that 

OO 

total time gambler’s fortune is —k = /„ 

n= 1 
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Taking expectations gives 


OO 


1 


iiftotal time gambler’s fortune is —k] = P(S n = —k) = 


(3.36) 


n =1 


Now, let a be the probability that the random walk is always negative after the initial 
movement. That is, 

a = P(S n < 0 for all n ^ 1) 

To determine a note that each time the gambler’s fortune is —k the probability that it will 
never again hit —k (because all cumulative winnings starting at that time are negative) 
is a. Hence, the number of times that the gambler’s fortune is — k is a geometric random 
variable with parameter a, and thus has mean 1 /a. Consequently, from (3.36) 


a = v 


Let us now define L-k to equal the last time that the random walk hits —k. Because 
L -,t will equal n if S„ = —k and the sequence of cumulative winnings from time n 
onwards is always negative, we see that 


P(L-k =n) = P(S n = —k)a = P(S n = —k)v 


Hence, 


OO 


E[L-k\ = J2 nP ( L -k=") 


n =0 


oo 


= v^^nP(S„ = —k) 


n =0 


OO 



by the hitting time theorem 


n =0 


oo 



n =0 



V V 2 


3.7 An Identity for Compound Random Variables 

Let X\, X 2 ,. ■ ■ be a sequence of independent and identically distributed random vari¬ 
ables, and let S n = X7=i ^i be the sum of the first n of them, n ^ 0, where So = 0. 
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Recall that if N is a nonnegative integer valued random variable that is independent of 
the sequence X\, Xi,.. . then 

N 

s n = J2 x < 

i =t 


is said to be a compound random variable, with the distribution of N called the com¬ 
pounding distribution. In this subsection we will first derive an identity involving such 
random variables. We will then specialize to where the X ; are positive integer val¬ 
ued random variables, prove a corollary of the identity, and then use this corollary to 
develop a recursive formula for the probability mass function of Sn, for a variety of 
common compounding distributions. 

To begin, let M be a random variable that is independent of the sequence X \, Xj...., 
and which is such that 


P{M = n] 


nP{N = n } 
E[N] 


n= 1,2 ,... 


Proposition 3.5 (The Compound Random Variable Identity) For any function h 


E[S N h(S N )] = E[N]E[Xih(S u )] 


Proof. 


E[S N h(S N )] = E 


N 




U=l 


= J2 e 


n =0 


N 


J2Xih(S N )\N = n 


,z=i 


(by conditioning on N) 


= 


n =0 
00 


J2Xih(S n )\N = . 


,z=l 


P{N = n) 


P{N = n} 


= 


n =0 


Y J X M S n) 


,! = 1 


P{N = n] 


(by independence of N and X \,..., X n ) 

00 n 

= EE E[XMS n )]P{N = n} 


n =0 i =1 


Now, because X\, ..., X n are independent and identically distributed, and h(S n ) = 
h (X 1 + • • • + X n ) is a symmetric function of X 1 ,..., X n , it follows that the distribution 
of Xjh(S„) is the same for all i = 1 , ,n. Therefore, continuing the preceding string 
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of equalities yields 

OO 

E[S N h(S N )] = £nE[Xi* (Sn)]P{N = n } 

n=0 

co 

= £[iV] E[X\h(S n )]P{M = n} (definition of M) 

n=0 
co 

= E[N]^2 E[Xih(S n )\M = n]P{M = n} 

n=0 

(independence of M and X \,..., X n ) 

CO 

= E[N]J2 E [Xih(S M )\M = n]P{M = n) 

n=0 

= E[N~\E[X\h(S M )] 

which proves the proposition. ■ 

Suppose now that the X ,■ are positive integer valued random variables, and let 

oij = P{Xi = j), j > 0 

The successive values of P{Sn = k} can often be obtained from the following corollary 
to Proposition 3.5. 

Corollary 3.6 

P{S N = 0} = P{N = 0} 

1 k 

P{S N =k}= -E[N]^2jajP{S M -i =k- j), k>0 
j =l 

Proof. For k fixed, let 


h(x) 


! 

0 , 


if x = k 
ifx^k 


and note that SnIi(Sn) is either equal to k if ,S',y = k or is equal to 0 otherwise. 
Therefore, 


E[S N h(S N )] = kP{S N = k } 
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and the compound identity yields 


kP{S N = k] = E[N]E[Xih(S M )] 

OO 

= £[AH E[XxHSm))\Xi = j]otj 
j =i 

OO 

= E[N]^ j E[h{S M )\X\ = j]a j 
j =i 

OO 

= E[N]J2j p {S M = k\X 1 =j}a j 
./=! 

Now, 


P{S M =k\X x = j } 


=*l*l = 7'j 

p\j + T / x ‘ = k ^ = j\ 

4y + E^ = 4 

P{Sm-\ = k — j] 


(3.37) 


The next to last equality followed because X 2 , ..., Xm and X 1 , ..., Xm- 1 have the 
same joint distribution; namely that of M — 1 independent random variables that all 
have the distribution of X\, where M — 1 is independent of these random variables. 
Thus the proof follows from Equation (3.37). ■ 

When the distributions of M — 1 and N are related, the preceding corollary can be 
a useful recursion for computing the probability mass function of Sn, as is illustrated 
in the following subsections. 


3. 7 .1 Poisson Compounding Distribution 

If N is the Poisson distribution with mean X, then 


P{M - 1 =n} = P{M = n + 1} 

(n + 1)P{1V = n + 1} 


E[N] 


1 

I 


(n + l)e~ l 


X n+l 


= e 


-x 


X n 

n\ 


(/i + 1)! 
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Consequently, M — 1 is also Poisson with mean X. Thus, with 

P n = P{Sn — n} 

the recursion given by Corollary 3.6 can be written 

Pq = e~ x 
X k 

Pk = k>Q 


Remark When the X j are identically 1, the preceding recursion reduces to the well- 
known identity for a Poisson random variable having mean X: 

P{N = 0} = e~ k 
X 

P[N = n) = —P{N = n — 1 }, n^l 
n 

Example 3.34 Let S be a compound Poisson random variable with X = 4 and 
P{X i= i} = 1/4, i = 1,2, 3, 4 


Let us use the recursion given by Corollary 3.6 to determine P{S = 5}. It gives 

Pq = e~ x = e ~ 4 
Pi = Xu {Pq = e 4 

Pi = ^ (at Pi + 2a2Po) = -e 4 

Pi = -(«i P 2 + 2u2P\ + 3a3Po) = 

3 o 

P 4 = —{a 1 Pi + 2 a 2 P 2 + 3a3 A + 4a 4 P 0 ) = — e ~ 4 

P 5 = -(«i P 4 + 2a 2 Pi + 3 uiP 2 + 4a 4 Pi + 5asPo) = —■ 

3.7.2 Binomial Compounding Distribution 

Suppose that N is a binomial random variable with parameters r and p. Then, 


P{M - 1 =n} = 


(n + 1 )P{N = n + 1} 


E[N] 


n + 1 


rp \n + 1 
n + 1 r\ 


p" + (1 - pY 


p" +1 (l-p) 


rp (r — 1 — n)\(n + 1)! 


r—l—n 


(r - 1)! 


(r — 1 — n)\n\ 


P"d-P) 


r—l—n 
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Thus, M — 1 is a binomial random variable with parameters r — 1, p. 

Fixing p, let /VO) be a binomial random variable with parameters r and p, and let 


P r (k) = P{S N(r) = k} 

Then, Corollary 3.6 yields 
PA 0) = (l - pY 

k 

p r (k) = jvjPr-Ak - j), k > 0 

K j =i 

For instance, letting k equal 1, then 2, and then 3 gives 

PA 1) = rpa t(1 - pY~ l 
Pr ( 2 ) = -^WiPr-lA) + 2a 2 Pr-im 

= v[<> - ApoqA - pY~ 2 + 2a 2 (l - pY~'] 

PA 3) = y[aiP,-i(2) + 2a 2 P,_ 1 (l) + 3a 3j P,-i(0)] 

= - 2)pa\(\ - P Y - + 2 « 2 (! - 

+ 2a - rp ( r - DpaO l - p ) r “ 2 +a 3 rp(l - p)' _1 


3.7.3 /I Compounding Distribution Related to the Negative Binomial 

Suppose, for a fixed value of p, 0 < p < 1, the compounding random variable N has 
a probability mass function 


P{N = n) 


{"V-Y) 


P r A-p)\ 


n = 0 , 1 ,... 


Such a random variable can be thought of as being the number of failures that occur 
before a total of r successes have been amassed when each trial is independently a 
success with probability p. (There will be n such failures if the rth success occurs 
on trial n + r. Consequently, N + r is a negative binomial random variable with 
parameters r and p.) Using that the mean of the negative binomial random variable 
N + r is E[N + r] = r/p, we see that £[A^] = r ^~- 
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Regard p as fixed, and call N an NBC/') random variable. The random variable M — 1 
has probability mass function 


P{M - 1 = n) = 


(n + l)P{N = n + 1} 


= (,. + i) P (-„+,-\ /(1 _ 

r{ 1 - P) V - V 


r\n\ 
n + r 


p r+ \\-p) n 


In other words, M — 1 is an NB(r +1) random variable. 
Letting, for an NB(r) random variable N, 


Pr(k ) = P{S N = k} 


Corollary 3.6 yields 
R,-(0) = p r 

P, (k ) = — Y jajPr+iik - j), k> 0 

kp U 

Thus, 

r( 1 — p) 

Pr( 1)= -- —Of\Pr+\ (0) 

P 

= rp’\ 1 - p)a i, 

P ' (2) = r(1 2p P) ^ lPr+l(1) + 2 “ 2jP ''+ i(0) ] 

= r(1 2p P) [ a i( r + 1 )P r+1 ( 1 “ p) + 2a 2 p r+l ]. 

p r(3) = r(1 ~ p P) [ a i Pr+\ (2) + 2a 2 f 5 r+t(l) + 3a 3j P r+ i(0)] 

and so on. 


Exercises 

1. If X and y are both discrete, show that ]C V /?x|y(Jt|y) = 1 for all y such that 

Pr(y) > 0. 

*2. Let X i and X 2 be independent geometric random variables having the same 
parameter p. Guess the value of 


P{X i =i\X l + X 2 = n] 
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Hint: Suppose a coin having probability p of coming up heads is continually 
flipped. If the second head occurs on flip number n, what is the conditional 

probability that the first head was on flip number i, i = 1 . n — 1 ? 

Verify your guess analytically. 

3. The joint probability mass function of X and Y, p(x , y), is given by 

*>(1,1) = 9 , *>(2,1) = $, pi 3,1) = $, 

P( 1, 2) = i, pi2,2) = 0, p(3, 2) = yj, 

*>(1,3) = 0, p{ 2,3)= l p{3,3) = \ 

Compute E[X\Y — i ] for i = 1, 2, 3. 

4. In Exercise 3, are the random variables X and Y independent? 

5. An urn contains three white, six red, and five black balls. Six of these balls are 
randomly selected from the urn. Let X and Y denote respectively the number of 
white and black balls selected. Compute the conditional probability mass function 
of X given that Y = 3. Also compute E[X\Y — 1], 

* 6 . Repeat Exercise 5 but under the assumption that when a ball is selected its color 
is noted, and it is then replaced in the urn before the next selection is made. 

7. Suppose p(x, y, z), the joint probability mass function of the random variables 
X , Y, and Z, is given by 

Pi 1,1,1) = 5, *>(2,1,1)=?, 

*>( 1 , 1 , 2 ) = $, *>( 2 , 1 , 2 )=^, 

*>( 1 , 2 , 1 ) = ^, *>( 2 , 2 , 1 ) = 0 , 

*>( 1 , 2 , 2 ) = 0 , *>( 2 , 2 , 2 ) = $ 

What is E[X\Y = 2]? What is E[X\Y = 2, Z = 1]? 

8 . An unbiased die is successively rolled. Let X and Y denote, respectively, the num¬ 
ber of rolls necessary to obtain a six and a five. Find (a) E[X~\, (b) E[X\Y = 1], 
(c) E[X\Y = 5], 

9. Show in the discrete case that if X and Y are independent, then 

E[X\Y = y] = E[X] for ally 

10. Suppose X and Y are independent continuous random variables. Show that 
E[X\Y = y] = £[X] for all y 


11. The joint density of X and Y is 


fix, y ) 


(/-*') -v 

- e - 


0 < y < oo, —y ^ x ^ y 


Show that E[X\Y = y] = 0. 




Conditional Probability and Conditional Expectation 


165 


12. The joint density of X and Y is given by 

e -*/y e -y 

fix, y) = -, 0 < x < oo, 0 < y < oo 

y 

Show E[X\Y — y] = y. 

*13. Let X be exponential with mean 1/A.; that is, 
fx (x) = Xe~ Xx , 0 < x < oo 

Find E[X\X > 1], 

14. Let X be uniform over (0, 1). Find E[X\X < y], 

15. The joint density of X and Y is given by 


e y 

f(x, y) = -, 0 < x < y, 0 < y < oo 

y 

Compute E\X 2 \Y = y], 

16. The random variables X and Y are said to have a bivariate normal distribution if 
their joint density function is given by 


fix, y) 


I 


2na x a y yl 1 - p 2 
2 


exp 


1 


X - 


2(1 -p 2 ) 

2p(x - fj, x )(y - Hy) 


+ 


fXy 


for —oo < x < oo, —oo < y < oo, where a x , a y ,\i x , /x y , and p are constants 
such that — 1 < p < 1, er v > 0, a y > 0, —oo < p. x < oo, —oo < fi y < oo. 

(a) Show that X is normally distributed with mean p, x and variance a 2 , and Y is 
normally distributed with mean fi y and variance a 2 . 

(b) Show that the conditional density of X given that Y = y is normal with mean 
/Tt + ( pa x /o y ){y - n y ) and variance a 2 ( 1 - p 2 ). 

The quantity p is called the correlation between X and Y. It can be shown 
that 

EKX - fi x ){Y - n y )] 

P = - 

<J X (Jy 

_ Co v(X, Y) 

<7 X (7y 


17. Let Y be a gamma random variable with parameters ( 5 , a). That is, its density is 
f Y iy) = Ce~ ay y s ~ l , y > 0 


where C is a constant that does not depend on y. Suppose also that the conditional 
distribution of X given that Y = y is Poisson with mean y. That is, 

P{X = i\Y = y] = e~ y y/i\, i > 0 
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Show that the conditional distribution of Y given that X = / is the gamma 
distribution with parameters (s + i, a + 1). 

18. Let X \,..., X n be independent random variables having a common distribution 
function that is specified up to an unknown parameter 9. Let T = T (X) be a func¬ 
tion of the data X = (Xi, ..., X n ). If the conditional distribution of X i. ..., X„ 
given T (X) does not depend on 9 then T (X) is said to be a sufficient statistic for 
9. In the following cases, show that T (X) = ^" =1 X,- is a sufficient statistic for 9. 

(a) The X, are normal with mean 9 and variance 1. 

(b) The density of Xj is f(x) = 9e~ dx , x > 0. 

(c) The mass function of Xj is p(x) — 9 X { 1 — 9) l ~ x , x = 0, 1,0 < 9 < 1. 

(d) The Xj are Poisson random variables with mean 9. 

*19. Prove that if X and Y are jointly continuous, then 



E[X] = 


20. An individual whose level of exposure to a certain pathogen is x will contract the 
disease caused by this pathogen with probability P(x). If the exposure level of a 
randomly chosen member of the population has probability density function /, 
determine the conditional probability density of the exposure level of that member 
given that he or she 

(a) has the disease. 

(b) does not have the disease. 

(c) Show that when P(x) increases in x, then the ratio of the density of part (a) 
to that of part (b) also increases in x. 

21. Consider Example 3.12, which refers to a miner trapped in a mine. Let N denote 
the total number of doors selected before the miner reaches safety. Also, let T, 
denote the travel time corresponding to the ith choice, i f \. Again let X denote 
the time when the miner reaches safety. 

(a) Give an identity that relates X to N and the 7). 

(b) What is E[N}1 

(c) What is £[7jv]? 

(d) What is T t \N = «]? 

(e) Using the preceding, what is £[X]? 

22. Suppose that independent trials, each of which is equally likely to have any of m 
possible outcomes, are performed until the same outcome occurs k consecutive 
times. If N denotes the number of trials, show that 



Some people believe that the successive digits in the expansion of tc = 3.14159... 
are “uniformly” distributed. That is, they believe that these digits have all the 
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appearance of being independent choices from a distribution that is equally likely 
to be any of the digits from 0 through 9. Possible evidence against this hypothesis 
is the fact that starting with the 24,658,601st digit there is a run of nine successive 
7s. Is this information consistent with the hypothesis of a uniform distribution? 

To answer this, we note from the preceding that if the uniform hypothesis were 
correct, then the expected number of digits until a run of nine of the same value 
occurs is 

(10 9 - l)/9 = 111,111,111 

Thus, the actual value of approximately 25 million is roughly 22 percent of the 
theoretical mean. However, it can be shown that under the uniformity assump¬ 
tion the standard deviation of N will be approximately equal to the mean. As a 
result, the observed value is approximately 0.78 standard deviations less than its 
theoretical mean and is thus quite consistent with the uniformity assumption. 
*23. A coin having probability p of coming up heads is successively flipped until two 
of the most recent three flips are heads. Let N denote the number of flips. (Note 
that if the first two flips are heads, then N = 2.) Find B[N], 

24. A coin, having probability p of landing heads, is continually flipped until at least 
one head and one tail have been flipped. 

(a) Find the expected number of flips needed. 

(b) Find the expected number of flips that land on heads. 

(c) Find the expected number of flips that land on tails. 

(d) Repeat part (a) in the case where flipping is continued until a total of at least 
two heads and one tail have been flipped. 

25. Independent trials, resulting in one of the outcomes 1, 2, 3 with respective prob¬ 
abilities p\, P 2 , P3, EiLi Pi — 1> are performed. 

(a) Let N denote the number of trials needed until the initial outcome has occurred 
exactly 3 times. For instance, if the trial results are 3,2,1,2,3,2,3 then A = 7. 
Find E[N]. 

(b) Find the expected number of trials needed until both outcome 1 and outcome 
2 have occurred. 

26. You have two opponents with whom you alternate play. Whenever you play A, 
you win with probability pa ; whenever you play B , you win with probability p n , 
where pb > P a ■ If your objective is to minimize the expected number of games 
you need to play to win two in a row, should you start with A or with B ? 

Hint: Let L[A ; ] denote the mean number of games needed if you initially 
play i. Derive an expression for £[A0i] that involves E[NbV, write down the 
equivalent expression for E[Nb] and then subtract. 

27. A coin that comes up heads with probability p is continually flipped until the 
pattern T, T, H appears. (That is, you stop flipping when the most recent flip lands 
heads, and the two immediately preceding it lands tails.) Let X denote the number 
of flips made, and find E[X], 
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28. Polya’s urn model supposes that an urn initially contains r red and b blue balls. 
At each stage a ball is randomly selected from the urn and is then returned along 
with m other balls of the same color. Let Xk be the number of red balls drawn in 
the first k selections. 

(a) Find E[X\\. 

(b) Find E[X 2 ]. 

(c) Find £[X 3 ], 

(d) Conjecture the value of E[Xk\, and then verify your conjecture by a condi¬ 
tioning argument. 

(e) Give an intuitive proof for your conjecture. 

Hint: Number the initial r red and b blue balls, so the urn contains one type i red 
ball, for each i = 1,..., r ; as well as one type j blue ball, for each j = 1 ,...,/?. 
Now suppose that whenever a red ball is chosen it is returned along with m others 
of the same type, and similarly whenever a blue ball is chosen it is returned along 
with m others of the same type. Now, use a symmetry argument to determine the 
probability that any given selection is red. 

29. Two players take turns shooting at a target, with each shot by player i hitting the 
target with probability pi , i — 1, 2. Shooting ends when two consecutive shots 
hit the target. Let //; denote the mean number of shots taken when player i shoots 
first, i = 1,2. 

(a) Find pL\ and pi 2 . 

(b) Let hi denote the mean number of times that the target is hit when player i 
shoots first, i = 1,2. Find h \ and h 2 . 

30. Let Xi, i 0 he independent and identically distributed random variables with 
probability mass function 

m 

p(j) = P\Xi = j), j = 2>0-) = l 

,/=i 

Find LIW], where N — minjn > 0 : X n — Xo}. 

31. Each element in a sequence of binary data is either 1 with probability p or 0 with 
probability 1 — p. A maximal subsequence of consecutive values having identical 
outcomes is called a run. For instance, if the outcome sequence is 1, 1,0, 1, 1, 1,0, 
the first run is of length 2, the second is of length 1, and the third is of length 3. 

(a) Find the expected length of the first run. 

(b) Find the expected length of the second run. 

32. Independent trials, each resulting in success with probability p, are performed. 

(a) Find the expected number of trials needed for there to have been both at least 
n successes and at least m failures. 

Hint: Is it useful to know the result of the first n + m trials? 

(b) Find the expected number of trials needed for there to have been either at 
least n successes or at least m failures. 
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Hint: Make use of the result from part (a). 

33. If Rj denotes the random amount that is earned in period i, then /T -1 A’,, 
where 0 < P < 1 is a specified constant, is called the total discounted reward with 
discount factor p. Let T he a geometric random variable with parameter 1 — p 
that is independent of the Rj. Show that the expected total discounted reward is 
equal to the expected total (undiscounted) reward earned by time T. That is, show 
that 


E 


_/= 1 


= E 


S> 


U=i 


34. A set of n dice is thrown. All those that land on six are put aside, and the others 
are again thrown. This is repeated until all the dice have landed on six. Let N 
denote the number of throws needed. (For instance, suppose that n = 3 and that 
on the initial throw exactly two of the dice land on six. Then the other die will be 
thrown, and if it lands on six, then N = 2.) Let m n = L[/V]. 

(a) Derive a recursive formula for m n and use it to calculate m ,, i = 2, 3, 4 and 
to show that m 5 & 13.024. 

(b) Let Xj denote the number of dice rolled on the /th throw. Find E\ X/]. 

35. Consider n multinomial trials, where each trial independently results in outcome 
; with probability p L , Pi = 1- With X equal to the number of trials that 
result in outcome i, find E\X\ \ Xi > 0]. 

36. Let po = P{X = 0} and suppose that 0 < po < 1. Let p = E[X] and 
cr 2 = Var(X). 

(a) Find E[X\X ^ 0]. 

(b) Find Var(X|X £ 0). 

37. A manuscript is sent to a typing firm consisting of typists A, B, and C. If it is 
typed by A, then the number of errors made is a Poisson random variable with 
mean 2.6; if typed by B, then the number of errors is a Poisson random variable 
with mean 3; and if typed by C, then it is a Poisson random variable with mean 
3.4. Let X denote the number of errors in the typed manuscript. Assume that each 
typist is equally likely to do the work. 

(a) Find £[X], 

(b) Find Var(X). 

38. Suppose Y is uniformly distributed on (0, 1), and that the conditional distribution 
of X given that Y = y is uniform on (0, y). Find E[X] and Var(X). 

39. A deck of n cards, numbered 1 through n, is randomly shuffled so that all nl 
possible permutations are equally likely. The cards are then turned over one at a 
time until card number 1 appears. These upturned cards constitute the first cycle. 
We now determine (by looking at the upturned cards) the lowest numbered card 
that has not yet appeared, and we continue to turn the cards face up until that card 
appears. This new set of cards represents the second cycle. We again determine 
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the lowest numbered of the remaining cards and turn the cards until it appears, 
and so on until all cards have been turned over. Let m n denote the mean number 
of cycles. 

(a) Derive a recursive formula for m n in terms of m^, k = 1, ..., n — 1. 

(b) Starting with mg = 0, use the recursion to find m \, m 2 , m 3 , and ln 4 - 

(c) Conjecture a general formula for m n . 

(d) Prove your formula by induction on n. That is, show it is valid for n = 1, 
then assume it is true for any of the values 1 ,,n — 1 and show that this 
implies it is true for n. 

(e) Let X, equal 1 if one of the cycles ends with card i , and let it equal 0 otherwise, 

1 = 1, ..., n. Express the number of cycles in terms of these X,. 

(f) Use the representation in part (e) to determine m n . 

(g) Are the random variables X \, ..., X n independent? Explain. 

(h) Find the variance of the number of cycles. 

40. A prisoner is trapped in a cell containing three doors. The first door leads to a 
tunnel that returns him to his cell after two days of travel. The second leads to a 
tunnel that returns him to his cell after three days of travel. The third door leads 
immediately to freedom. 

(a) Assuming that the prisoner will always select doors 1, 2, and 3 with prob¬ 
abilities 0.5, 0.3, 0.2, what is the expected number of days until he reaches 
freedom? 

(b) Assuming that the prisoner is always equally likely to choose among those 
doors that he has not used, what is the expected number of days until he 
reaches freedom? (In this version, for instance, if the prisoner initially tries 
door 1, then when he returns to the cell, he will now select only from doors 

2 and 3.) 

(c) For parts (a) and (b) find the variance of the number of days until the prisoner 
reaches freedom. 

41. Workers 1are currently idle. Suppose that each worker, independently, 
has probability p of being eligible for a job, and that a job is equally likely to be 
assigned to any of the workers that are eligible for it (if none are eligible, the job 
is rejected). Find the probability that the next job is assigned to worker 1. 

*42. If Xi, i = 1,...,« are independent normal random variables, with X, having 
mean /x,- and variance 1, then the random variable X 2 is said to be a non¬ 
central chi-squared random variable. 

(a) if A is a normal random variable having mean pt and variance 1 show, for 
t < 1/2, that the moment generating function of X 2 is 

(1 - 2?r 1/2 ei=2? 

(b) Derive the moment generating function of the noncentral chi-squared random 
variable X7=i and show that its distribution depends on the sequence of 
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means [i \,..., n n only through the sum of their squares. As a result, we say 
that X? is a noncentral chi-squared random variable with parameters n 
and 0 = £"=t /if. 

(c) If all /I: = 0, then X,- is called a chi-squared random variable with 
n degrees of freedom. Determine, by differentiating its moment generating 
function, its expected value and variance. 

(d) Let K be a Poisson random variable with mean 0 /2, and suppose that con¬ 
ditional on K = k, the random variable W has a chi-squared distribution 
with n + 2k degrees of freedom. Show, by computing its moment generating 
function, that W is a noncentral chi-squared random variable with parameters 
n and 0 . 

(e) Find the expected value and variance of a noncentral chi-squared random 
variable with parameters n and 0 . 

43. For P(Y e A) > 0, show that 


E[X\Y e A] = 


E[XI{Y e A}] 
P(Y e A) 


where I{B} is the indicator variable of the event B. equal to 1 if B occurs and to 
0 otherwise. 

44. The number of customers entering a store on a given day is Poisson distributed 
with mean X = 10. The amount of money spent by a customer is uniformly 
distributed over (0, 100). Find the mean and variance of the amount of money 
that the store takes in on a given day. 

45. An individual traveling on the real line is trying to reach the origin. However, 
the larger the desired step, the greater is the variance in the result of that step. 
Specifically, whenever the person is at location x, he next moves to a location 
having mean 0 and variance fix 2 . Let X n denote the position of the individual 
after having taken n steps. Supposing that Xo = xo, find 

(a) E[X n ]; 

(b) Var(AY). 

46. (a) Show that 


Cov(X, Y) = Cov(X, £[T|X]) 


(b) Suppose, that, for constants a and b, 
E[Y\X] = a + bX 


Show that 


b = Cov(X, T)/Var(X) 
*47. If E[Y\X] = 1, show that 
Var(X Y) ^ Var(X) 
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48. Suppose that we want to predict the value of a random variable X by using one 
of the predictors Y\,..., Y n , each of which satisfies E [ Y, X] = X. Show that the 
predictor F; that minimizes E [(K, — X) 2 ] is the one whose variance is smallest. 
Hint: Compute Var(F,) by using the conditional variance formula. 

49. A and B play a series of games with A winning each game with probability p. 
The overall winner is the first player to have won two more games than the other. 

(a) Find the probability that A is the overall winner. 

(b) Find the expected number of games played. 

50. There are three coins in a barrel. These coins, when flipped, will come up heads 
with respective probabilities 0.3,0.5,0.7. A coin is randomly selected from among 
these three and is then flipped ten times. Let N be the number of heads obtained 
on the ten flips. 

(a) Find P{N = 0}. 

(b) Find P{N = «}, n =0, 1,..., 10. 

(c) Does N have a binomial distribution? 

(d) If you win $1 each time a head appears and you lose $1 each time a tail 
appears, is this a fair game? Explain. 

51. If X is geometric with parameter p, find the probability that X is even. 

52. Suppose that X and Y are independent random variables with probability density 
functions fx and fy . Determine a one-dimensional integral expression for P{X + 
Y < x }. 

*53. Suppose X is a Poisson random variable with mean /,. The parameter X is itself 
a random variable whose distribution is exponential with mean 1. Show that 
P{X = n} = ( l l y+ l . 

54. Independent trials, each resulting in a success with probability p, are performed 
until k consecutive successful trials have occurred. Let X be the total number of 
successes in these trial, and let P n = P(X — n). 

(a) Find Py. 

(b) Derive a recursive equation for the P n , n y k, by imagining that the trials 
continue forever and conditioning on the time of the first failure. 

(c) Verify your answer in part (a) by solving the recursion for Py. 

(d) When p = .6, k = 3, find P 8 . 

55. In the preceding problem let My = E[X], Derive a recursive equation for My and 
then solve. 

Hint: Start by writing Xy = Xy_ \ + Ay-\ y, where A, is the total number of 
successes attained up to the first time there have been i consecutive successes, 
and Ay-i^y is the additional number of successes after there have been k — 1 
successes in a row until there have been k successes in a row. 

56. Data indicate that the number of traffic accidents in Berkeley on a rainy day is a 
Poisson random variable with mean 9, whereas on a dry day it is a Poisson random 
variable with mean 3. Let X denote the number of traffic accidents tomorrow. If 
it will rain tomorrow with probability 0.6, find 
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(a) E[X]- 

(b) P{X = 0}; 

(c) Var(X). 

57. The number of storms in the upcoming rainy season is Poisson distributed but 
with a parameter value that is uniformly distributed over (0, 5). That is, A is 
uniformly distributed over (0, 5), and given that A = X, the number of storms 
is Poisson with mean X. Find the probability there are at least three storms this 
season. 

58. Suppose that the conditional distribution of N , given that Y = y, is Poisson with 
mean y. Further suppose that Y is a gamma random variable with parameters 
(r, X), where r is a positive integer. That is, suppose that 

y n 

P(N = n\Y = y) = e~ y — 
n\ 


and 


fv 00 = 


Xe- Xy {XyY- 1 
(r- 1)! 


y > 0 


(a) Find £[A]. 

(b) Find Var(A). 

(c) Find P(N = n ) 

(d) Using (c), argue that N is distributed as the total number of failures before 
the rth success when each trial is independently a success with probability 



59. Suppose each new coupon collected is, independent of the past, a type ; coupon 
with probability p,. A total of n coupons is to be collected. Let A, be the event 
that there is at least one type i in this set. For i y j, compute P(A,Aj ) by 

(a) conditioning on A, , the number of type i coupons in the set of n coupons; 

(b) conditioning on F ,, the first time a type i coupon is collected; 

(c) using the identity P(A, U A/) — P{Aj) + P(Aj) — P(AjAj). 

*60. Two players alternate flipping a coin that comes up heads with probability p. The 
first one to obtain a head is declared the winner. We are interested in the probability 
that the first player to flip is the winner. Before determining this probability, which 
we will call f(p), answer the following questions. 

(a) Do you think that f(p) is a monotone function of p ? If so, is it increasing or 
decreasing? 

(b) What do you think is the value of lim p ^ | f{p)l 

(c) What do you think is the value of lim^o /(f)? 

(d) Find f(p). 

61. Suppose in Exercise 29 that the shooting ends when the target has been hit twice. 
Let mi denote the mean number of shots needed for the first hit when player i 
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shoots first, i = 1,2. Also, let P, , i = 1,2, denote the probability that the first 
hit is by player 1, when player i shoots first. 

(a) Find;«i and m 2 . 

(b) Find P\ and /V 

For the remainder of the problem, assume that player 1 shoots first. 

(c) Find the probability that the final hit was by 1. 

(d) Find the probability that both hits were by 1. 

(e) Find the probability that both hits were by 2. 

(f) Find the mean number of shots taken. 

62. A, B, and C are evenly matched tennis players. Initially A and B play a set, 
and the winner then plays C. This continues, with the winner always playing the 
waiting player, until one of the players has won two sets in a row. That player is 
then declared the overall winner. Find the probability that A is the overall winner. 

63. Suppose there are n types of coupons, and that the type of each new coupon 
obtained is independent of past selections and is equally likely to be any of the 
n types. Suppose one continues collecting until a complete set of at least one of 
each type is obtained. 

(a) Find the probability that there is exactly one type i coupon in the final col¬ 
lection. 

Hint: Condition on T, the number of types that are collected before the first 
type i appears. 

(b) Find the expected number of types that appear exactly once in the final 
collection. 

64. A and B roll a pair of dice in turn, with A rolling first. A’s objective is to obtain 
a sum of 6, and B’ s is to obtain a sum of 7. The game ends when either player 
reaches his or her objective, and that player is declared the winner. 

(a) Find the probability that A is the winner. 

(b) Find the expected number of rolls of the dice. 

(c) Find the variance of the number of rolls of the dice. 

65. The number of red balls in an urn that contains n balls is a random variable that 

is equally likely to be any of the values 0, 1.n. That is, 

1 

P{i red, n — i non-red} = -, i = 0. n 

n + 1 

The n balls are then randomly removed one at a time. Let Yk denote the number 
of red balls in the first k selections, k = 1 

(a) Find P{Y n = j}, j = 0, ..., n. 

(b) Find P{ Y n _] = j }, j = 0,..., n. 

(c) What do you think is the value of P{Yk — j], j = 0,... ,nl 
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(d) Verify your answer to part (c) by a backwards induction argument. That is, 
check that your answer is correct when k = n, and then show that whenever 
it is true for A: it is also true for k — 1, k = 1. n. 

66. The opponents of soccer team A are of two types: either they are a class 1 or a 
class 2 team. The number of goals team A scores against a class i opponent is a 
Poisson random variable with mean A.,-, where /.| = 2, A. 2 = 3. This weekend the 
team has two games against teams they are not very familiar with. Assuming that 
the first team they play is a class 1 team with probability 0.6 and the second is, 
independently of the class of the first team, a class 1 team with probability 0.3, 
determine 

(a) the expected number of goals team A will score this weekend. 

(b) the probability that team A will score a total of five goals. 

*67. A coin having probability p of coming up heads is continually flipped. Let Pj (n ) 
denote the probability that a run of j successive heads occurs within the first n 
flips. 

(a) Argue that 

Pj(n) = Pj(n - 1) + pH 1 - p)[ 1 - Pj(n - j - 1)] 

(b) By conditioning on the first non-head to appear, derive another equation relat¬ 
ing Pj (n) to the quantities Pj(n — k), k = 1. 

68. In a knockout tennis tournament of 2" contestants, the players are paired and play 
a match. The losers depart, the remaining 2 "~ 1 players are paired, and they play a 
match. This continues for n rounds, after which a single player remains unbeaten 
and is declared the winner. Suppose that the contestants are numbered 1 through 
2", and that whenever two players contest a match, the lower numbered one wins 
with probability p. Also suppose that the pairings of the remaining players are 
always done at random so that all possible pairings for that round are equally 
likely. 

(a) What is the probability that player 1 wins the tournament? 

(b) What is the probability that player 2 wins the tournament? 

Hint: Imagine that the random pairings are done in advance of the tournament. 
That is, the first-round pairings are randomly determined; the 2 n_1 first-round 
pairs are then themselves randomly paired, with the winners of each pair to play 
in round 2; these 2" -2 groupings (of four players each) are then randomly paired, 
with the winners of each grouping to play in round 3, and so on. Say that players 
i and j are scheduled to meet in round k if, provided they both win their first 
k — 1 matches, they will meet in round k. Now condition on the round in which 
players 1 and 2 are scheduled to meet. 

69. In the match problem, say that (i, j), i < j, is a pair if i chooses j’s hat and j 
chooses i ’s hat. 

(a) Find the expected number of pairs. 
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Figure 3.7 


(b) Let Q n denote the probability that there are no pairs, and derive a recursive 
formula for Q n in terms of Qj, j < n. 

Hint: Use the cycle concept. 

(c) Use the recursion of part (b) to find Q$. 

70. Let N denote the number of cycles that result in the match problem. 

(a) Let M n — E[W], and derive an equation for M n in terms of Mi,..., M„_i. 

(b) Let Cj denote the size of the cycle that contains person j. Argue that 

n 

n = J2 1 ! c i 

7 = 1 

and use the preceding to determine E[N]. 

(c) Find the probability that persons 1,2,k are all in the same cycle. 

(d) Find the probability that 1, 2,..., k is a cycle. 

71. Use Equation (3.13) to obtain Equation (3.9). 

Hint: First multiply both sides of Equation (3. 13 ) by n , then write a new equation 
by replacing n with n — 1, and then subtract the former from the latter. 

72. In Example 3.28 show that Ihe conditional distribution of N given that U\ = y is 
the same as the conditional distribution of M given that U\ = 1 — y. Also, show 
that 

E[N\U\ = y] = E[M\U\ = 1 - y] = 1 + e y 

*73. Suppose that we continually roll a die until the sum of all throws exceeds 100. 
What is the most likely value of this total when you stop? 

74. There are five components. The components act independently, with component 
i working with probability pj,i = 1, 2, 3, 4, 5. These components form a system 
as shown in Figure 3.7. 

The system is said to work if a signal originating at the left end of the diagram 
can reach the right end, where it can pass through a component only if that 
component is working. (For instance, if components 1 and 4 both work, then the 
system also works.) What is the probability that the system works? 
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75. This problem will present another proof of the ballot problem of Example 3.27. 

(a) Argue that 

P nm = 1 — P{A and B are tied at some point} 

(b) Explain why 

P{A receives first vote and they are eventually tied} 

= P{B receives first vote and they are eventually tied} 

Hint: Any outcome in which they are eventually tied with A receiving the first 
vote corresponds to an outcome in which they are eventually tied with B receiving 
the first vote. Explain this correspondence. 

(c) Argue that Pfeventually tied} = 2 m/(n + m), and conclude that P n m = 
{n — m)/(n + m). 

76. Consider a gambler who on each bet either wins 1 with probability 18/38 or loses 
1 with probability 20/38. (These are the probabilities if the bet is that a roulette 
wheel will land on a specified color.) The gambler will quit either when he or she 
is winning a total of 5 or after 100 plays. What is the probability he or she plays 
exactly 15 times? 

77. Show that 

(a) E[XY\Y = y] = yE[X\Y = y] 

(b) E[g(X, Y)\Y = y]= E[g(X, y)\Y = y] 

(c) E[XY] = E[YE[X\Y]] 

78. In the ballot problem (Example 3.27), compute P { A is never behind}. 

79. An urn contains n white and m black balls that are removed one at a time. If n > m , 
show that the probability that there are always more white than black balls in the 
urn (until, of course, the urn is empty) equals ( n — m)/(n + m). Explain why 
this probability is equal to the probability that the set of withdrawn balls always 
contains more white than black balls. (This latter probability is (n — m)/(n + m) 
by the ballot problem.) 

80. A coin that comes up heads with probability p is flipped n consecutive times. 
What is the probability that starting with the first flip there are always more heads 
than tails that have appeared? 

81. Let Xj , i 1, be independent uniform (0, 1) random variables, and define N by 

N = minjn: X„ < X„_i} 
where Xq = x. Let f(x) = E}^]. 

(a) Derive an integral equation for f(x) by conditioning on X i. 

(b) Differentiate both sides of the equation derived in part (a). 

(c) Solve the resulting equation obtained in part (b). 
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(d) For a second approach to determining f (x) argue that 


P{N > k} = 


(1 -x) k ~ x 

(k- D! 


(e) Use part (d) to obtain fix). 

82. Let X\, X 2 ,... be independent continuous random variables with a common 
distribution function F and density / = F', and for k ^ 1 let 


Nk — minjn ^ k: X n — £th largest of Xi ,..., X n ] 

(a) Show that P{Nk = n} = n k { ~\ ) , n > k. 

(b) Argue that 


fx Nk (x) = f(x)(F(x)) k - ] ('' + * 2 ) (F(x)Y 

;=0 ^ ' 

(c) Prove the following identity: 

OO /. s 

a l ~ k = 0 <a<l,fe >2 

1=0 ^ ' 


Hint: Use induction. First prove it when k = 2, and then assume it for k. To 
prove it for k + 1, use the fact that 



(1 - ay 



(1 -«)' 



(1 -ay 


where the preceding used the combinatorial identity 



Now, use the induction hypothesis to evaluate the first term on the right side of 
the preceding equation. 

(d) Conclude that X^ k has distribution F. 

83. An urn contains n balls, with ball i having weight w;, i = 1,..., n. The balls are 
withdrawn from the urn one at a time according to the following scheme: When 
S is the set of balls that remains, ball i, i e S , is the next ball withdrawn with 
probability w, / ]C /c(i wj. Find the expected number of balls that are withdrawn 
before ball;, i = 1, ... ,n. 
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84. Suppose in Example 3.32 that a point is only won if the winner of the rally was 
the server of that rally. 

(a) If A is currently serving, what is the probability that A wins the next point? 

(b) Explain how to obtain the final score probabilities. 

85. In the list problem, when the P, are known, show that the best ordering (best in the 
sense of minimizing the expected position of the element requested) is to place 
the elements in decreasing order of their probabilities. That is, if P\ > Pi > 

> Pn, show that 1, 2. n is the best ordering. 

86. Consider the random graph of Section 3.6.2 when n\ = 5. Compute the probability 
distribution of the number of components and verify your solution by using it to 
compute E[C] and then comparing your solution with 



87. (a) From the results of Section 3.6.3 we can conclude that there are (" + ” 1 *) 


nonnegative integer valued solutions of the equation x\ + ■ ■ ■ + x m — n. 
Prove this directly. 

(b) How many positive integer valued solutions of xi + • • • + x m — n are there? 
Hint: Let v; = xi — 1. 

(c) For the Bose-Einstein distribution, compute the probability that exactly k of 
the Xj are equal to 0. 

88. In Section 3.6.3, we saw that if U is a random variable that is uniform on (0, 1) 
and if, conditional on U = p, X is binomial with parameters n and p, then 


1 


P{X = i}= -, i =0, ,n 

n + 1 


For another way of showing this result, let U, X\, Xj ,..., X n be independent 
uniform (0, 1) random variables. Define X by 

X = #i: Xi < U 

That is, if the n + 1 variables are ordered from smallest to largest, then U would 
be in position X + 1 ■ 

(a) What is P{X = /}? 

(b) Explain how this proves the result of Section 3.6.3. 

89. Let 1 1 ./„ be independent random variables, each of which is equally likely 

to be either 0 or 1. A well-known nonparametric statistical test (called the signed 
rank test) is concerned with determining P n ( k ) defined by 


' 



Pn(k) = P 
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Justify the following formula: 

P n (k) = \P n -i(k)+ \P n -\ (k — n) 

90. The number of accidents in each period is a Poisson random variable with mean 
5. With X n , n ^ 1, equal to the number of accidents in period n, find T [<V] when 

(a) N = min («: X n - 2 = 2, Z„_i = 1, X n = 0); 

(b) N = min (n: Z„_ 3 = 2, X„_ 2 = 1, = 0, X n = 2). 

91. Find the expected number of flips of a coin, which comes up heads with probability 
p, that are necessary to obtain the pattern h, t, h, h, t. h, t, h. 

92. The number of coins that Josh spots when walking to work is a Poisson random 
variable with mean 6. Each coin is equally likely to be a penny, a nickel, a dime, 
or a quarter. Josh ignores the pennies but picks up the other coins. 

(a) Find the expected amount of money that Josh picks up on his way to work. 

(b) Find the variance of the amount of money that Josh picks up on his way to 
work. 

(c) Find the probability that Josh picks up exactly 25 cents on his way to work. 

*93. Consider a sequence of independent trials, each of which is equally likely to 
result in any of the outcomes 0,1 , ,m. Say that a round begins with the first 

trial, and that a new round begins each time outcome 0 occurs. Let N denote the 
number of trials that it takes until all of the outcomes 1,..., m — 1 have occurred 
in the same round. Also, let Tj denote the number of trials that it takes until j 
distinct outcomes have occurred, and let I j denote the / th distinct outcome to 
occur. (Therefore, outcome Ij first occurs at trial Tj.) 

(a) Argue that the random vectors (I \,..., /,„) and ( T\,.... T m ) are independent. 

(b) Define X by letting X = j if outcome 0 is the /th distinct outcome to 
occur. (Thus, lx = 0.) Derive an equation for P[N] in terms of E[Tj], j = 
1, ..., m — 1 by conditioning on X. 

(c) Determine E[Tj], j = 1,..., m — 1. 

Hint: See Exercise 42 of Chapter 2. 

(d) Find £[A]. 

94. Let A be a hypergeometric random variable having the distribution of the number 
of white balls in a random sample of size r from a set of w white and b blue balls. 
That is, 

P{N = n} = 

(r 

where we use the convention that ('") = 0 if either j < 0 or j > in. Now, 

consider a compound random variable S/v = X, = i where the A, are positive 
integer valued random variables with ay = P{X, = j}. 
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(a) With M as defined as in Section 3.7, find the distribution of M — 1. 

(b) Suppressing its dependence on b, let P wr (k) = / J {.S'v = A:}, and derive a 
recursion equation for P u}J (k). 

(c) Use the recursion of (b) to find P w r (2). 

95. For the left skip free random walk of Section 3.6.6 let ft = P(S n X 0 for all n) 
be the probability that the walk is never positive. Find /3 when E[X{\ < 0. 

96. Consider a large population of families, and suppose that the number of children 
in the different families are independent Poisson random variables with mean 
X. Show that the number of siblings of a randomly chosen child is also Poisson 
distributed with mean X. 

*97. Use the conditional variance formula to find the variance of a geometric random 
variable. 

98. For a compound random variable S = Ylh=i X;, find Cov(iV, S). 

99. Let N be the number of trials until k consecutive successes have occurred, when 
each trial is independently a success with probability p. 

(a) What is P(N — k)l 

(b) Argue that 

P(N = k + r) = P(N > r — 1 )qp k , r > 0 

(c) Show that 


1 - p k = qp k E[N] 
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4.1 Introduction 

Consider a process that has a value in each time period. Let X n denote its value in 
time period n, and suppose we want to make a probability model for the sequence of 
successive values Xq, X\, Xj .... The simplest model would probably be to assume 
that the X n are independent random variables, but often such an assumption is clearly 
unjustified. For instance, starting at some time suppose that X n represents the price of 
one share of some security, such as Google, at the end of n additional trading days. 
Then it certainly seems unreasonable to suppose that the price at the end of day n + 1 
is independent of the prices on days n, n — 1, n — 2 and so on down to day 0. However, 
it might be reasonable to suppose that the price at the end of trading day n + 1 depends 
on the previous end-of-day prices only through the price at the end of day n. That is, it 
might be reasonable to assume that the conditional distribution of X n+ \ given all the 
past end-of-day prices X n , X n — 1, ..., Xq depends on these past prices only through 
the price at the end of day n. Such an assumption defines a Markov chain, a type of 
stochastic process that will be studied in this chapter, and which we now formally 
define. 

Let {X n , n — 0, 1, 2, ...,} be a stochastic process that takes on a finite or countable 
number of possible values. Unless otherwise mentioned, this set of possible values of 
the process will be denoted by the set of nonnegative integers {0, 1,2,...}. If X n — i, 
then the process is said to be in state i at time n. We suppose that whenever the process 
is in state i, there is a fixed probability P ;/ - that it will next be in state j. That is, we 
suppose that 

P{X n + 1 = j\X n = i, X n -\ = i n - 1 ,..., X[ = i\,X o = i'o} = / J ,y (4.1) 
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for all states /q, i\, ..., i n -\, i, j and all n Js 0. Such a stochastic process is known as a 
Markov chain. Equation (4.1) may be interpreted as stating that, for a Markov chain, the 
conditional distribution of any future state X„+i,given the past statesXo, X \, ..., X n -\ 
and the present state X n , is independent of the past states and depends only on the present 
state. 

The value P, , represents the probability that the process will, when in state i, next 
make a transition into state j. Since probabilities are nonnegative and since the process 
must make a transition into some state, we have 


Ptj> 0, i,j^ 0; J2 p ij = 1 ’ *' = 0,1,... 

j =o 

Let P denote the matrix of one-step transition probabilities Pu , so that 


Poo 

Pot 

P02 • • • 

Pio 

Pit 

Pi 2 

Pi0 

P/1 

P/2 ••• 


Example 4.1 (Forecasting the Weather) Suppose that the chance of rain tomorrow 
depends on previous weather conditions only through whether or not it is raining today 
and not on past weather conditions. Suppose also that if it rains today, then it will rain 
tomorrow with probability a ; and if it does not rain today, then it will rain tomorrow 
with probability . 

If we say that the process is in state 0 when it rains and state 1 when it does not 
rain, then the preceding is a two-state Markov chain whose transition probabilities are 
given by 


Example 4.2 (A Communications System) Consider a communications system that 
transmits the digits 0 and 1. Each digit transmitted must pass through several stages, at 
each of which there is a probability p that the digit entered will be unchanged when it 
leaves. Letting X n denote the digit entering the nth stage, then { X n . n = 0, 1,...} is a 
two-state Markov chain having a transition probability matrix 


P = 




Example 4.3 On any given day Gary is either cheerful (C), so-so ( S ), or glum (G). If 
he is cheerful today, then he will be C, S, or G tomorrow with respective probabilities 
0.5, 0.4, 0.1. If he is feeling so-so today, then he will be C, S, or G tomorrow with 
probabilities 0.3, 0.4, 0.3. If he is glum today, then he will be C, S, or G tomorrow 
with probabilities 0.2, 0.3, 0.5. 
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Letting X n denote Gary’s mood on the nth day, then { X n , n ^ 0} is a three-state 
Markov chain (state 0 = C, state 1 = S, state 2 = G) with transition probability matrix 



0.5 

0.4 

0.1 

p = 

0.3 

0.4 

0.3 


0.2 

0.3 

0.5 


Example 4.4 (Transforming a Process into a Markov Chain) Suppose that whether 
or not it rains today depends on previous weather conditions through the last two days. 
Specifically, suppose that if it has rained for the past two days, then it will rain tomor¬ 
row with probability 0.7; if it rained today but not yesterday, then it will rain tomorrow 
with probability 0.5; if it rained yesterday but not today, then it will rain tomorrow with 
probability 0.4; if it has not rained in the past two days, then it will rain tomorrow with 
probability 0.2. 

If we let the state at time n depend only on whether or not it is raining at time /;, then 
the preceding model is not a Markov chain (why not?). However, we can transform 
this model into a Markov chain by saying that the state at any time is determined by 
the weather conditions during both that day and the previous day. In other words, we 
can say that the process is in 

state 0 if it rained both today and yesterday, 
state 1 if it rained today but not yesterday, 
state 2 if it rained yesterday but not today, 
state 3 if it did not rain either yesterday or today. 

The preceding would then represent a four-state Markov chain having a transition 
probability matrix 


P = 


You should carefully check the matrix P, and make sure you understand how it was 
obtained. ■ 

Example 4.5 (A Random Walk Model) A Markov chain whose state space is given 
by the integers ; = 0, ±1, ±2, ... is said to be a random walk if, for some number 

0 < p < 1, 

Pi,i +1 = P = 1 — Pi,i-U i — 0, ±1, . . . 

The preceding Markov chain is called a random walk for we may think of it as being 
a model for an individual walking on a straight line who at each point of time either 
takes one step to the right with probability p or one step to the left with probability 
1 - p. ■ 

Example 4.6 (A Gambling Model) Consider a gambler who, at each play of the 
game, either wins $1 with probability p or loses $1 with probability 1 — p. If we 


0.7 

0 

0.3 

0 

0.5 

0 

0.5 

0 

0 

0.4 

0 

0.6 

0 

0.2 

0 

0.8 
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suppose that our gambler quits playing either when he goes broke or he attains a fortune 
of $ N, then the gambler’s fortune is a Markov chain having transition probabilities 

Pi,i+ 1 = P = 1 - Pi,i- 1, i = 1, 2,..., N - 1, 

Poo — Pnn — 1 


States 0 and N are called absorbing states since once entered they are never left. 
Note that the preceding is a finite state random walk with absorbing barriers (states 0 
and N). ■ 

Example 4.7 In most of Europe and Asia annual automobile insurance premiums are 
determined by use of a Bonus Malus (Latin for Good-Bad) system. Each policyholder 
is given a positive integer valued state and the annual premium is a function of this 
state (along, of course, with the type of car being insured and the level of insurance). 
A policyholder’s state changes from year to year in response to the number of claims 
made by that policyholder. Because lower numbered states correspond to lower annual 
premiums, a policyholder’s state will usually decrease if he or she had no claims in the 
preceding year, and will generally increase if he or she had at least one claim. (Thus, 
no claims is good and typically results in a decreased premium, while claims are bad 
and typically result in a higher premium.) 

For a given Bonus Malus system, let s, ik) denote the next state of a policyholder who 
was in state i in the previous year and who made a total of k claims in that year. If we 
suppose that the number of yearly claims made by a particular policyholder is a Poisson 
random variable with parameter A, then the successive states of this policyholder will 
constitute a Markov chain with transition probabilities 


Pu = 


E 

k:st(k)=j 



j> o 


Whereas there are usually many states (20 or so is not atypical), the following table 
specifies a hypothetical Bonus Malus system having four states. 




Next state if 

State 

Annual Premium 

0 claims 

1 claim 

2 claims 

^ 3 claims 

1 

200 

1 

2 

3 

4 

2 

250 

1 

3 

4 

4 

3 

400 

2 

4 

4 

4 

4 

600 

3 

4 

4 

4 


Thus, for instance, the table indicates that .s' 2 (0) = 1; .V 2 (1 ) = 3; sj(k) = 4, k k> 2. 
Consider a policyholder whose annual number of claims is a Poisson random variable 
with parameter A. If ak is the probability that such a policyholder makes k claims in a 
year, then 


ak = e 


-A 


A* 
k\ ’ 


k ^ 0 
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For the Bonus Malus system specified in the preceding table, the transition probability 
matrix of the successive states of this policyholder is 


do Cl\ 0.2 

a o 0 flj 

0 flo 0 

0 0 ao 


1 — ao — Cl\ — Cl2 
1 — flo — fli 
1 - flo 
1 - flo 


4.2 Chapman-Kolmogorov Equations 

We have already defined the one-step transition probabilities P/j. We now define the 
/z-step transition probabilities P". to be the probability that a process in state i will be 
in state j after n additional transitions. That is, 

Pjj = P{X n+k = j\X k = i), n > 0, i, 0 

Of course P ( A = P, j . The Chapman-Kolmogorov equations provide a method for com¬ 
puting these /z-step transition probabilities. These equations are 

OO 

pn+ m = p" k p™ for all n, m > 0, all i, j (4.2) 

k =0 

and are most easily understood by noting that represents the probability that 

starting in z the process will go to state j in n + m transitions through a path which 
takes it into state k at the zzth transition. Flence, summing over all intermediate states k 
yields the probability that the process will be in state j after n + m transitions. Formally, 
we have 


P’} +m = P{X n+m = j | V 0 = /} 

OO 

= P ^ X n+m = j, X„ = k\Xo = i} 

k =0 

OO 

= J2 p { x n+m = j\X„ =k,X 0 = i}P{X n = k\X 0 = i} 

k =0 


= E 


pin pii 
r kj r ik 


k =0 


If we let P (,, ) denote the matrix of n -step transition probabilities P, then Equation 
(4.2) asserts that 


p (n+m) _ p(n) p (m) 
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where the dot represents matrix multiplication.* Hence, in particular, 

p(2) _ p(l+l) _ p . p _ p2 


and by induction 

p(«) _ p(/( —1 + 1) _ p(( —1 _ p _ pH 


That is, the /(-step transition matrix may be obtained by multiplying the matrix P by 
itself n times. 

Example 4.8 Consider Example 4.1 in which the weather is considered as a two-state 
Markov chain. If a — 0.7 and /i = 0.4, then calculate the probability that it will rain 
four days from today given that it is raining today. 

Solution: The one-step transition probability matrix is given by 


0.7 

0.3 

0.4 

0.6 


Hence, 


p(2) _ p2 


P (4) = (P 2 ) 2 


0.7 

0.3 


0.7 

0.3 

0.4 

0.6 


0.4 

0.6 


0.61 0.39 

0.52 0.48 


0.61 

0.39 


0.61 

0.39 

0.52 

0.48 


0.52 

0.48 


0.5749 0.4251 
0.5668 0.4332 


and the desired probability Pg 0 equals 0.5749. 


Example 4.9 Consider Example 4.4. Given that it rained on Monday and Tuesday, 
what is the probability that it will rain on Thursday? 


Solution: The two-step transition matrix is given by 


0.7 

0 

0.3 

0 


0.7 

0 

0.3 

0 

0.5 

0 

0.5 

0 


0.5 

0 

0.5 

0 

0 

0.4 

0 

0.6 


0 

0.4 

0 

0.6 

0 

0.2 

0 

0.8 


0 

0.2 

0 

0.8 


0.49 

0.12 

0.21 

0.18 

0.35 

0.20 

0.15 

0.30 

0.20 

0.12 

0.20 

0.48 

0.10 

0.16 

0.10 

0.64 


* If A is an N x M matrix whose element in the /th row and /'th column is cijj and li is an M x K matrix 
whose element in the (th row and /th column is bjj, then AB is defined to be the N x K matrix whose 
element in the (th row and /th column is a ikbkj ■ 
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Since rain on Thursday is equivalent to the process being in either state 0 or 
state 1 on Thursday, the desired probability is given by Pq 0 + Py, = 0.49 + 
0.12 = 0.61. "" ■ 

Example 4.10 An urn always contains 2 balls. Ball colors are red and blue. At each 
stage a ball is randomly chosen and then replaced by a new ball, which with probability 
0.8 is the same color, and with probability 0.2 is the opposite color, as the ball it replaces. 
If initially both balls are red, find the probability that the fifth ball selected is red. 

Solution: To find the desired probability we first define an appropriate Markov 
chain. This can be accomplished by noting that the probability that a selection is 
red is determined by the composition of the urn at the time of the selection. So, let 
us define X n to be the number of red balls in the urn after the nth selection and 
subsequent replacement. Then X n ,n 0, is a Markov chain with states 0, 1, 2 and 
with transition probability matrix P given by 

/ 0.8 0.2 0 \ 

0.1 0.8 0.1 
\ 0 0.2 0 . 8 / 

To understand the preceding, consider for instance P\ q. Now, to go from 1 red ball 
in the urn to 0 red balls, the ball chosen must be red (which occurs with probability 
0.5) and it must then be replaced by a ball of opposite color (which occurs with 
probability 0.2), showing that 


Pt,0 = (0.5) (0.2) = 0.1 


To determine the probability that the fifth selection is red, condition on the number 
of red balls in the urn after the fourth selection. This yields 

P (fifth selection is red) 

2 

= P (fifth selection is red| X 4 = i)P{X 4 = i\Xq = 2) 

;=0 

= (0)P 4 0 + (0.5)P 2 4 1 + (1)P 2 4 2 
= 0.5 P 4 , + P 2 4 2 

To calculate the preceding we compute P 4 . Doing so yields 
p/j = 0.4352, P 2 4 2 = 0.4872 

giving the answer P(fifth selection is red) = 0.7048. ■ 

Example 4.11 Suppose that balls are successively distributed among 8 urns, with 
each ball being equally likely to be put in any of these urns. What is the probability 
that there will be exactly 3 nonempty urns after 9 balls have been distributed? 
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Solution: If we let X n be the number of nonempty urns after n balls have been 
distributed, then X n , n ^ 0 is a Markov chain with states 0, 1, ..., 8 and transition 
probabilities 

Pi,i = */8 = 1 — Pi,i+1, i=0, 1.8 

The desired probability is Pq 3 = P 8 3 , where the equality follows because Po,i = 1. 
Now, starting with 1 occupied urn, if we had wanted to determine the entire proba¬ 
bility distribution of the number of occupied urns after 8 additional balls had been 
distributed we would need to consider the transition probability matrix with states 
1, 2,..., 8. However, because we only require the probability, starting with a single 
occupied urn, that there are 3 occupied urns after an additional 8 balls have been 
distributed we can make use of the fact that the state of the Markov chain cannot 
decrease to collapse all states 4, 5, ..., 8 into a single state 4 with the interpretation 
that the state is 4 whenever four or more of the urns are occupied. Consequently, we 
need only determine the eight-step transition probability P® 3 of the Markov chain 
with states 1, 2, 3, 4 having transition probability matrix P given by 

/1/8 7/80 0 \ 

0 2/8 6/8 0 

0 0 3/8 5/8 

v 0 0 0 1 , 

Raising the preceding matrix to the power 4 yields the matrix P 4 given by 

/0.0002 0.0256 0.2563 0.7178\ 

0 0.0039 0.0952 0.9009 

0 0 0.0198 0.9802 

v 0 0 0 1 ) 

Hence, 

Pf 3 = 0.0002 x 0.2563 + 0.0256 x 0.0952 + 0.2563 x 0.0198 

+ 0.7178 x 0 — 0.00756 ■ 

Consider a Markov chain with transition probabilities Pij . Let stf be a set of states, 
and suppose we are interested in the probability that the Markov chain ever enters any 
of the states in .&/ by time m. That is, for a given state i </ g/, we are interested in 
determining 


ft = P(Xk e g/ for some k = 1, ..., m\Xo = i ) 

To determine the preceding probability we will define a Markov chain { W n , n Js 0[ 
whose states are the states that are not in ,</A plus an additional state, which we will 
call A in our general discussion (though in specific examples we will usually give it a 
different name). Once the {W n } Markov chain enters state A it remains there forever. 
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The new Markov chain is defined as follows. Letting X n denote the state at time n 
of the Markov chain with transition probabilities Pj j, define 

N = min{« : X„ e &/} 


and let N = oo if X n </ .<// for all n. In words, N is the first time the Markov chain 
enters the set of states .«/. Now, define 


W n 


X n , if n<N 
A, if n ^ IV 


So the state of the { W n } process is equal to the state of the original Markov chain up 
to the point when the original Markov chain enters a state in At that time the new 
process goes to state A and remains there forever. From this description it follows that 
W„,n ^ 0 is a Markov chain with states i, i ^ ,«•/, A and with transition probabilities 
Qij, given by 


Qij = Pij, if i £ j i ^ 
Qua = ^2 Pij, if i i ^ 

jestf 

Qa.a = 1 


Because the original Markov chain will have entered a state in ,e/ by time m if and only 
if the state at time m of the new Markov chain is A, we see that 


P(Xk e s# for some k = 1,..., m\X q = i ) 

= P(W m = A|X 0 = i) = P(W m = A|Wo = i) = Q"] A 

That is, the desired probability is equal to an m-step transition probability of the new 
chain. 

Example 4.12 In a sequence of independent flips of a fair coin, let N denote the 
number of flips until there is a run of three consecutive heads. Find 

(a) P(N ^ 8) and 

(b) P(N = 8). 

Solution: To determine P(N ^ 8), define a Markov chain with states 0, 1, 2, 3 
where for i < 3 state i means that we currently are on a run of i consecutive heads, 
and where state 3 means that a run of three consecutive heads has already occurred. 
Thus, the transition probability matrix is 


/1/2 

1/2 

0 

0 ^ 

1/2 

0 

1/2 

0 

1/2 

0 

0 

1/2 

V o 

0 

0 

1 ) 


where, for instance, the values for row 2 are obtained by noting that if we currently 
are on a run of size 1 then the next state will be 0 if the next flip is a tail, or 2 if it is a 
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head. Hence, I’\o = P\ 2 = 1 /2. Because there would be a run of three consecutive 
heads within the first eight flips if and only if Zs = 3, the desired probability is P 0 8 3 . 
Squaring P to obtain P 2 , then squaring the result to obtain P 4 , and then squaring 
that matrix gives the result 

/81/256 44/256 24/256 107/256\ 

8 _ 68/256 37/256 20/256 131/256 

V ~ 44/256 24/256 13/256 175/256 

v 0 0 0 1 ) 

Hence, the probability that there will be a run of three consecutive heads within the 
first eight flips is 107/256 ~ .4180. 

(b) One way to obtain P(N = 8 ), the probability that it takes eight flips to obtain 
the first run of three consecutive heads, is to use that 

P(N = 8 ) = P(N ^ 8 ) - P(N < 7) = Pq 3 - P 0 7 3 


Another way to determine P(N = 8 ) is to consider a Markov chain with states 
0, 1, 2, 3, 4 where, as before, for i < 3 state i means that we currently are on a run 
of i consecutive heads, state 3 means that the first run of size 3 has just occurred, 
and state 4 that a run of size 3 occurred in the past. That is, this Markov chain has 
transition probability matrix 


Q = 


/ 1/2 

1/2 

1/2 

0 

V° 


1/2 0 

0 1/2 

0 0 

0 0 

0 0 


0 0 \ 
0 0 
1/2 0 
0 1 
0 l) 


N will equal 8 if starting in state 0 the preceding Markov chain is in state 3 after 
eight transitions. That is, P(N = 8 ) = <2q 3 - ■ 


Suppose now that we want to compute the probability that the {X n , n /s 0} chain, 
starting in state i , enters state j at time m without ever entering any of the states in .ft/, 
where neither i nor j is in .ft/. That is, for i, j / .s/, we are interested in 


a = P(X m — j, Xk srf, k = 1,..., m — 1|Zq = i ) 


Noting that the event that X m = j. X^ £ .ft/, k = I. m — I is equivalent to the 

event that W m = j, it follows that for i, j £ srf ', 


P(X m = j, X k £ srf, k = 1,..., m - 1|X 0 = i ) = P(W m = j |Z 0 = i) 
= P(W m =j\W 0 = i) = QTj. 


Example 4.13 Consider a Markov chain with states 1, 2, 3, 4, 5, and suppose that 
we want to compute 


P(X 4 = 2, Z 3 < 2, Z 2 < 2, Xi < 2|Z 0 = 1) 
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That is, we want the probability that, starting in state 1, the chain is in state 2 at time 4 
and has never entered any of the states in the set si = {3, 4, 5}. 

To compute this probability all we need to know are the transition probabilities 
P\ l, P\ 2 , Pi\, Pii- So, suppose that 

P\\ = 0.3 Pu — 0.3 
P 2 1=0.1 P 2 2 = 0.2 

Then we consider the Markov chain having states 1, 2, 3 (we are giving state A the 
name 3), and having the transition probability matrix Q as follows: 

/ 0.3 0.3 0.4 \ 

0.1 0.2 0.7 

\0 0 1 J 

The desired probability is Q p. Raising Q to the power 4 yields the matrix 

/ 0.0219 0.0285 0.9496 \ 

0.0095 0.0124 0.9781 
\0 0 1 ) 

Hence, the desired probability is a — 0.0285. ■ 

When i ^ si but j e ,</i we can determine the probability 
a = P(X m = j, X k <£ si, k = l,..., m - l\X 0 = i) 
as follows. 

a = 'Yh P ( Xm = j> X »i~i = r ’ X k i si,k=\ -- m - 2\X 0 = i) 

= ^ P(X m = j\X m - l = r, X k $ si, k = 1,..., m - 2, X 0 = i) 

x P ( X m _1 — r,X k £ si,k = l,..., m — 2 \Xq — i) 

= P rJ P(X m -1 = r,X k $ si ,k=\, 2 \Xq = i ) 

= E r -i or;' 

Also, when i e si we could determine 

a = P(X m = j,X k $si,k=l,...,m-l\Xo = i) 
by conditioning on the first transition to obtain 
P(x m = j, x k i si, 

k=\,...,m- l\X 0 = i,Xi = r)P(X\ = r\X 0 = i) 

J2 P(X m -1 = j, X k ? si, k = 1,..., m - 2 \Xq = r)P tJ 


a = 
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For instance, if i e .ft/, j ,<•/ then the preceding equation yields 

P(X m =j,X k tjt/,k=l,...,m-l\X 0 = i ) = J2 Q"r,i lp ‘<r 


We can also compute the conditional probability of X n given that the chain starts in 
state i and has not entered any state in ,e/ by time n, as follows. For i, j .ft/, 

P{X n = j\X 0 = i, X k £srf,k= 1, ..., n) 

= P{X n = j, X k j s/, k = 1,.,,, n\X Q = /} = Q"j 

P{X k is/,k= 1, . . . , n\Xo = i} Q'lr 

Remark So far, all of the probabilities we have considered are conditional probabili¬ 
ties. For instance, P". is the probability that the state at time n is j given that the initial 
state at time 0 is /. If the unconditional distribution of the state at time n is desired, 
it is necessary to specify the probability distribution of the initial state. Let us denote 
this by 


a; = P{Xo = i], i ^ 0 I T>,- = 1 

\i=0 

All unconditional probabilities may be computed by conditioning on the initial state. 
That is, 


P{X n =j}=J2 P ( X " = j\ Xo = = *'} 

!= 0 


oo 



1=0 


For instance, if «o = 0.4, c/\ — 0.6, in Example 4.8, then the (unconditional) pro¬ 
bability that it will rain four days after we begin keeping weather records is 

P{X 4 = 0} = 0.4Pq 0 + 0.6P[ 4 0 

= (0.4) (0.5749) + (0.6) (0.5668) 

= 0.5700 


4.3 Classification of States 

State j is said to be accessible from state i if P ( " > 0 for some n / 0. Note that 
this implies that state j is accessible from state i if and only if, starting in ;, it is 
possible that the process will ever enter state j . This is true since if j is not accessible 
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from i , then 


!’ {ever be in j | start in i } 


p\ = 


X 0 = i 


l n =0 
oo 

^J2 P ^ X n = j\ X 0 = i) 


n =0 


oo 


E p "j 

n =0 

o 


Two states i and j that are accessible to each other are said to communicate, and we 
write i • o- j. 

Note that any state communicates with itself since, by definition, 

Pn = P{X o = i\X 0 = i} = 1 

The relation of communication satisfies the following three properties: 

(i) State i communicates with state i, all i 0. 

(ii) If state i communicates with state j, then state j communicates with state i. 

(iii) If state i communicates with state j, and state j communicates with state k, then 
state i communicates with state k. 

Properties (i) and (ii) follow immediately from the definition of communication. To 
prove (iii) suppose that i communicates with j, and j communicates with/:. Thus, there 
exist integers n and m such that P", > 0, Pjj, > 0. Now by the Chapman-Kolmogorov 
equations, we have 

OO 

pn+m _ \ A pii pm \ pn pm n 
' ik — ‘ if ‘rk & r ij rjk > u 

r=0 

Hence, state k is accessible from state i. Similarly, we can show that state i is accessible 
from state k. Hence, states i and k communicate. 

Two states that communicate are said to be in the same class. It is an easy conse¬ 
quence of (i), (ii), and (iii) that any two classes of states are either identical or disjoint. 
In other words, the concept of communication divides the state space up into a number 
of separate classes. The Markov chain is said to be irreducible if there is only one class, 
that is, if all states communicate with each other. 

Example 4.14 Consider the Markov chain consisting of the three states 0, 1, 2 and 
having transition probability matrix 

2 0 
1 1 
4 4 

I 2 

3 3 
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It is easy to verify that this Markov chain is irreducible. For example, it is possible to 
go from state 0 to state 2 since 

0 -* 1 -* 2 


That is, one way of getting from state 0 to state 2 is to go from state 0 to state 1 (with 
probability |) and then go from state 1 to state 2 (with probability ^). ■ 

Example 4.15 Consider a Markov chain consisting of the four states 0, 1,2,3 and 
having transition probability matrix 


1 

2 

1 

2 

0 

0 

1 

2 

1 

2 

0 

0 

1 

1 

1 

1 

4 

4 

4 

4 

0 

0 

0 

1 


The classes of this Markov chain are {0, 1}, {2}, and {3}. Note that while state 0 
(or 1) is accessible from state 2, the reverse is not true. Since state 3 is an absorbing 
state, that is, P 33 = 1 , no other state is accessible from it. ■ 

For any state i we let f, denote the probability that, starting in state i , the process 
will ever reenter state i. State i is said to be recurrent if /', = I and transient if /,• < 1. 

Suppose that the process starts in state i and i is recurrent. Hence, with probability 
1, the process will eventually reenter state i. However, by the definition of a Markov 
chain, it follows that the process will be starting over again when it reenters state i and, 
therefore, state i will eventually be visited again. Continual repetition of this argument 
leads to the conclusion that if state i is recurrent then, starting in state i, the process 
will reenter state i again and again and again—in fact, infinitely often. 

On the other hand, suppose that state i is transient. Hence, each time the process 
enters state i there will be a positive probability, namely, 1 — f, that it will never again 
enter that state. Therefore, starting in state i, the probability that the process will be in 
state i for exactly n time periods equals f" -1 ( 1 — f]), n f 1. In other words, if state 
i is transient then, starting in state i, the number of time periods that the process will 
be in state i has a geometric distribution with finite mean 1/(1 — /,-). 

From the preceding two paragraphs, it follows that state i is recurrent if and only if, 
starting in state i, the expected number of time periods that the process is in state i is 
infinite. But, letting 


1, if X n = i 
0, if X n + i 


we have that In represents the number of periods that the process is in state i. 

Also, 


OO 

J2 I n\ X 0 = i 

-n =0 


J2E[In\X 0 = i] 

n =0 


E 
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oo 

= Y, P ^ Xn =i\Xo = i] 

n =0 
oo 

= E'Vi 

n =0 

We have thus proven the following. 

Proposition 4.1 State i is 

OO 

recurrent if Z p ii = °°> 

«=i 

oo 

transient if E^<°° 

«=i 

The argument leading to the preceding proposition is doubly important because it 
also shows that a transient state will only be visited a finite number of times (hence 
the name transient). This leads to the conclusion that in a finite-state Markov chain not 
all states can be transient. To see this, suppose the states are 0, 1 ,,M and suppose 
that they are all transient. Then after a finite amount of time (say, after time 7o) state 
0 will never be visited, and after a time (say, 7)) state 1 will never be visited, and 
after a time (say, 7?) state 2 will never be visited, and so on. Thus, after a finite time 

T — max{7b, 7). Tm) no states will be visited. But as the process must be in some 

state after time T we arrive at a contradiction, which shows that at least one of the 
states must be recurrent. 

Another use of Proposition 4.1 is that it enables us to show that recurrence is a class 
property. 

Corollary 4.2 If state i is recurrent, and state i communicates with state j , then state 
j is recurrent. 

Proof. To prove this we first note that, since state i communicates with state j, there 
exist integers k and m such that pK > 0, P'" > 0. Now, for any integer n 

pm-\-n-\-k ^ pm pn pk 

1 jj ^ ji a P 

This follows since the left side of the preceding is the probability of going from j to 
j in m + n + k steps, while the right side is the probability of going from j to j in 
m + n + k steps via a path that goes from j to i in m steps, then from i to i in an 
additional n steps, then from i to j in an additional k steps. 

From the preceding we obtain, by summing over n, that 

oo oo 

J2 p jj +H+k > p Ti p ijT t p u = o ° 

72=1 72=1 

since PJj pK > 0 and i P'] i s infinite since state i is recurrent. Thus, by Proposition 
4.1 it follows that state j is also recurrent. ■ 
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Remarks 

(i) Corollary 4.2 also implies that transience is a class property. For if state i is transient 
and communicates with state j, then state j must also be transient. For if j were 
recurrent then, by Corollary 4.2, i would also be recurrent and hence could not be 
transient. 

(ii) Corollary 4.2 along with our previous result that not all states in a finite Markov 
chain can be transient leads to the conclusion that all states of a finite irreducible 
Markov chain are recurrent. 

Example 4.16 Let the Markov chain consisting of the states 0, 1, 2, 3 have the 

transition probability matrix 


0 

0 

1 

2 

1 

2 

1 

0 

0 

0 

0 

1 

0 

0 

0 

1 

0 

0 


Determine which states are transient and which are recurrent. 

Solution: It is a simple matter to check that all states communicate and, hence, 
since this is a finite chain, all states must be recurrent. ■ 

Example 4.17 Consider the Markov chain having states 0, 1, 2, 3, 4 and 


1 

2 

1 

2 

0 

0 

0 

1 

2 

1 

2 

0 

0 

0 

0 

0 

1 

2 

1 

2 

0 

0 

0 

1 

2 

1 

2 

0 

1 

4 

1 

4 

0 

0 

1 

2 


Determine the recurrent state. 

Solution: This chain consists of the three classes {0, 1}, {2, 3}, and {4}. The first 
two classes are recurrent and the third transient. ■ 

Example 4.18 (A Random Walk) Consider a Markov chain whose state space con¬ 
sists of the integers i = 0, ±1, ±2,..., and has transition probabilities given by 

Pi,i+i = P = 1 - Pi,i-u i = 0, ±1, ±2,... 

where 0 < p < 1. In other words, on each transition the process either moves one step 
to the right (with probability p) or one step to the left (with probability 1 — p). One 
colorful interpretation of this process is that it represents the wanderings of a drunken 
man as he walks along a straight line. Another is that it represents the winnings of a 
gambler who on each play of the game either wins or loses one dollar. 
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Since all states clearly communicate, it follows from Corollary 4.2 that they are 
either all transient or all recurrent. So let us consider state 0 and attempt to determine 
if Pq Q is finite or infinite. 

Since it is impossible to be even (using the gambling model interpretation) after an 
odd number of plays we must, of course, have that 

P 0 2 »-‘ = 0, n = 1,2,... 


On the other hand, we would be even after 2 n trials if and only if we won n of these 
and lost n of these. Because each play of the game results in a win with probability p 
and a loss with probability 1 — p, the desired probability is thus the binomial probability 


pin 

00 



P n $-P) n 


(2 n)\ 
n\n\ 


OKI-/>))", 


77 = 1,2,3,... 


By using an approximation, due to Stirling, which asserts that 


i\ ~ n nJrX/2 e~ n sFht 


(4.3) 


where we say that a n ~ b n when lim,,-^ a n /b n = 1, we obtain 

pin _ ( 4 P( 1 -P))" 

Now it is easy to verify, for positive a n , b n , that if a n ~ b n , then Y^ n a n < oo if and 
only if b n < oo. Hence, Yl'nLi P$o w 'll converge if and only if 

y ( 4 P {\ - P )) n 
^ Jun 

n =1 

does. However, 4/7(1 — p) ^ 1 with equality holding if and only if p = Hence, 
EZiPSo = 00 if and only if p = Thus, the chain is recurrent when p = \ and 
transient if p ^ 4. 

When p = 2 , the preceding process is called a symmetric random walk. We could 
also look at symmetric random walks in more than one dimension. For instance, in the 
two-dimensional symmetric random walk the process would, at each transition, either 
take one step to the left, right, up, or down, each having probability f. That is, the state 
is the pair of integers (i.j) and the transition probabilities are given by 

^(hy'l.O'+lJ) = = P(i,j),(i,j + 1) = 1) = 4 

By using the same method as in the one-dimensional case, we now show that this 
Markov chain is also recurrent. 

Since the preceding chain is irreducible, it follows that all states will be recurrent 
if state 0 = (0, 0) is recurrent. So consider . Now after 2n steps, the chain will be 
back in its original location if for some 7,0^7 ^ w, the 2n steps consist of i steps to 
the left, 7 to the right, n — i up, and n — i down. Since each step will be either of these 
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four types with probability it follows that the desired probability is a multinomial 
probability. That is, 



where the last equality uses the combinatorial identity 



which follows upon noting that both sides represent the number of subgroups of size n 
one can select from a set of n white and n black objects. Now, 

(2 n\ = (2n)! 

\ n ) n In ! 

(2n ) 2,1 + 1 / 2 e~ 2n \f7jz 

~ r,2n + i e -2n {27t) Stirling’s approximation 

4" 

s/rcn 


Hence, from Equation (4.4) we see that 


p2 n 

r 00 


1 

nn 


which shows that ^2 h Pqq = oo, and thus all states are recurrent. 

Interestingly enough, whereas the symmetric random walks in one and two dimen¬ 
sions are both recurrent, all higher-dimensional symmetric random walks turn out to 
be transient. (For instance, the three-dimensional symmetric random walk is at each 
transition equally likely to move in any of six ways—either to the left, right, up, down, 
in, or out.) ■ 

Remark For the one-dimensional random walk of Example 4.18 here is a direct 
argument for establishing recurrence in the symmetric case, and for determining the 
probability that it ever returns to state 0 in the nonsymmetric case. Let 


P = Pjever return to 0 } 
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To determine f>, start by conditioning on the initial transition to obtain 

P = Pjever return to 0|Xi = l}p + Pfever return to 0|Ai = —1}(1 — p) (4.5) 

Now, let a denote the probability that the Markov chain will ever return to state 0 given 
that it is currently in state 1. Because the Markov chain will always increase by 1 with 
probability p or decrease by 1 with probability 1 — p no matter what its current state, 
note that a is also the probability that the Markov chain currently in state i will ever 
enter state / — 1, for any i. To obtain an equation for a, condition on the next transition 
to obtain 

a = P {ever return |Xi = 1, X 2 = 0}(1 — p) + Pfever return|Xi = 1, X 2 = 2 }p 
= 1 — p + P{ever return|Xi = 1, X 2 = 2}p 
= 1 - p + pa 2 

where the final equation follows by noting that in order for the chain to ever go from 
state 2 to state 0 it must first go to state 1 —and the probability of that ever happening 
is a —and if it does eventually go to state 1 then it must still go to state 0 —and the 
conditional probability of that ever happening is also a. Therefore, 

a = 1 — p + pa 2 

The two roots of this equation are a = 1 and a = (1 — p)/p. Consequently, in the 
case of the symmetric random walk where p = 1/2 we can conclude that a — 1. By 
symmetry, the probability that the symmetric random walk will ever enter state 0 given 
that it is currently in state —1 is also 1 , proving that the symmetric random walk is 
recurrent. 

Suppose now that p > 1 /2. In this case, it can be shown (see Exercise 17 at the end 
of this chapter) that P{ever return to 0| A" 1 = — 1} = 1. Consequently, Equation (4.5) 
reduces to 


P = ap + 1 - p 

Because the random walk is transient in this case we know that P < 1, showing that 
a ^ 1. Therefore, a = (1 — p)/p, yielding that 

P = 2(1 — p), p > 1/2 

Similarly, when p < 1/2 we can show that p = 2p. Thus, in general 

P{ever return to 0} = 2 min (p, 1 — p) ■ 

Example 4.19 (On the Ultimate Instability of the Aloha Protocol) Consider a 
communications facility in which the numbers of messages arriving during each of the 
time periods n = 1 , 2 ,... are independent and identically distributed random variables. 
Let a,- = Pji arrivals}, and suppose that wq + a\ < 1. Each arriving message will 
transmit at the end of the period in which it arrives. If exactly one message is transmitted, 
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then the transmission is successful and the message leaves the system. However, if at 
any time two or more messages simultaneously transmit, then a collision is deemed 
to occur and these messages remain in the system. Once a message is involved in a 
collision it will, independently of all else, transmit at the end of each additional period 
with probability p —the so-called Aloha protocol (because it was first instituted at the 
University of Hawaii). We will show that such a system is asymptotically unstable in 
the sense that the number of successful transmissions will, with probability 1, be finite. 

To begin let X n denote the number of messages in the facility at the beginning of 
the /zth period, and note that {X n , n ^ 0} is a Markov chain. Now for k Js 0 define the 
indicator variables 4 by 


4 


1 , if the first time that the chain departs state k it 
directly goes to state k — 1 
0 , otherwise 


and let it be 0 if the system is never in state k, k ^ 0. (For instance, if the successive 
states are 0, 1, 3, 4, ..., then 4 = 0 since when the chain first departs state 3 it goes to 
state 4; whereas, if they are 0, 3, 3, 2, ..., then 4 = 1 since this time it goes to state 2.) 
Now, 


OO 


E 


£4 


= Y E[Ik] 

k =0 
00 

= £p{4 = 1} 

k =0 
00 

^ Y^ P{Ik = 1 \k is ever visited} 

k =0 


(4.6) 


Now, P{h — I \k is ever visited} is the probability that when state k is departed the 
next state is k — 1. That is, it is the conditional probability that a transition from k is to 
k — 1 given that it is not back into k, and so 

Pk Ic— 1 

P{h = 11 A: is ever visited} = - 1 - 

i - Pk,k 

Because 


Pk,k-l = aokp(l - p) k \ 

Pk,k = flo[l - kp( 1 - p) k ~ l ] + a l (l - p) k 

which is seen by noting that if there are k messages present on the beginning of a 
day, then (a) there will be k — 1 at the beginning of the next day if there are no new 
messages that day and exactly one of the k messages transmits; and (b) there will be k 
at the beginning of the next day if either 

(i) there are no new messages and it is not the case that exactly one of the existing k 
messages transmits, or 
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(ii) there is exactly one new message (which automatically transmits) and none of the 
other k messages transmits. 

Substitution of the preceding into Equation (4.6) yields 

E\±i k u± _ «* (i 7r _ T 

LJ 1 — «o[l - kp(l - p) k — «i(1 - p) k 

< oo 

where the convergence follows by noting that when k is large the denominator of 
the expression in the preceding sum converges to 1 — ao and so the convergence or 
divergence of the sum is determined by whether or not the sum of the terms in the 
numerator converge and ^0 — P) k ~ l < °°- 

Hence, E\ Yl'kLa 41 < oo, which implies that YltL o 4 < oo with probability 1 (for 
if there was a positive probability that Y^kLo 4 could be oo, then its mean would be oo). 
Hence, with probability 1, there will be only a finite number of states that are initially 
departed via a successful transmission; or equivalently, there will be some finite integer 
N such that whenever there are N or more messages in the system, there will never 
again be a successful transmission. From this (and the fact that such higher states will 
eventually be reached—why?) it follows that, with probability 1, there will only be a 
finite number of successful transmissions. ■ 

Remark For a (slightly less than rigorous) probabilistic proof of Stirling’s approxi¬ 
mation, let X 1 X 2 , ... be independent Poisson random variables each having mean 1. 
LetS,; = ^T" = | Xj , and note that both the mean and variance of S n are equal to n. Now, 

P{S n = > 1 } = P{n — 1 < S n ^ n) 

= P{~\/y/7l < (Sn - n)/J7l ^0} 

1/2 * 2/2 , when n is large, by the 

central limit theorem 

* (2?r) -1 / 2 (l/V«) 

= ( 2nn)~ 



But S n is Poisson with mean n , and so 
e~ n n n 

P{S n =n}= -— 

n\ 

Hence, for n large 

e~ n n n 1/9 

-^ (2nnV 1 

n\ 

or, equivalently 

n\ ~ n ,!+ 1 ' / 2 e~ n V2 jt 


which is Stirling’s approximation. 
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4.4 Long-Run Proportions and Limiting Probabilities 

For pairs of states i ^ j, let /; j denote the probability that the Markov chain, starting 
in state i, will ever make a transition into state j. That is, 

fi j = P(X n = j for some n > 0|Xo = i) 

We then have the following result. 

Proposition 4.3 If i is recurrent and i communicates with j, then j] j = 1. 

Proof. Because i and / communicate there is a value n such that P" > 0. Let 

l ’J 

Xq = i and say that the first opportunity is a success if X„ = j, and note that the 
first opportunity is a success with probability p !j > 0. If the first opportunity is not a 
success then consider the next time (after time n) that the chain enters state i . (Because 
state i is recurrent we can be certain that it will eventually reenter state i after time n.) 
Say that the second opportunity is a success if n time periods later the Markov chain 
is in state j. If the second opportunity is not a success then wait until the next time the 
chain enters state i and say that the third opportunity is a success if n time periods later 
the Markov chain is in state j. Continuing in this manner, we can define an unlimited 
number of opportunities, each of which is a success with the same positive probability 
PPj. Because the number of opportunities until the first success occurs is geometric 
with parameter P" ., it follows that with probability 1 a success will eventually occur 
and so, with probability 1 , state j will eventually be entered. ■ 

If state j is recurrent, let m j denote the expected number of transitions that it takes the 
Markov chain when starting in state j to return to that state. That is, with 


Nj = min{« > 0 : X n = j) 


equal to the number of transitions until the Markov chain makes a transition into 
state j, 

mj = E[Nj\X 0 = j] 

Definition: Say that the recurrent state j is positive recurrent if m j < oo and say 

that it is null recurrent if m j — oo. 

Now suppose that the Markov chain is irreducible and recurrent. In this case we now 
show that the long-run proportion of time that the chain spends in state j is equal to 
1 /m j . That is, letting n j denote the long-run proportion of time that the Markov chain 
is in state j, we have the following proposition. 

Proposition 4.4 If the Markov chain is irreducible and recurrent, then for any initial 
state 

nj = 1/m.j 

Proof. Suppose that the Markov chain starts in state i, and let '!) denote the number 
of transitions until the chain enters state j ; then let 'l\ denote the additional number of 
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transitions from time T\ until the Markov chain next enters state /; then let Ti, denote 
the additional number of transitions from time T\ + 73 until the Markov chain next 
enters state j, and so on. Note that T\ is finite because Proposition 4.3 tells us that with 
probability 1 a transition into j will eventually occur. Also, for n ^ 2, because T n is the 
number of transitions between the (n — l)th and the nth transition into state j, it follows 
from the Markovian property that T 2 , Tj ,... are independent and identically distributed 
with mean m j. Because the nth transition into state j occurs at time '/)+... + T n we 
obtain that txj, the long-run proportion of time that the chain is in state j, is 


U j = lim ™ 

n ^o° £. = J T . 


= lim 
n —>00 

= lim 

l —; 

1 

in j 


1 


I T 

n 2-,i =1 ‘i 

1 


— + 
n n 


where the last equality follows because lim„_ j . 00 T\/n — 0 and, from the strong law of 
large numbers, lim^oo T2+ - +T " = lim ; ,^oo Tl+ n l\ T " ^ = mj. U 

Because mj < 00 is equivalent to 1 /mj > 0, it follows from the preceding that state 
j is positive recurrent if and only if Ttj > 0. We now exploit this to show that positive 
recurrence is a class property. 

Proposition 4.5 If i is positive recurrent and i -o- j then j is positive recurrent. 

Proof. Suppose that i is positive recurrent and that i -o- j. Now, let n be such that 
PPj > 0. Because : r; is the long-run proportion of time that the chain is in state i, and 
Pp. is the long-run proportion of time when the Markov is in state i that it will be in 
state j after n transitions 


Tti P! 1 : = long-run proportion of time the chain is in i 
and will be in j after n transitions 
= long-run proportion of time the chain is in j 
and was in i n transitions ago 
^ long-run proportion of time the chain is in j 

Hence, nj p jt, P". > 0, showing that j is positive recurrent. ■ 

Remarks 


(i) It follows from the preceding result that null recurrence is also a class property. For 
suppose that i is null recurrent and i • o- j . Because i is recurrent and i o j we can 
conclude that j is recurrent. But if j were positive recurrent then by the preceding 
proposition i would also be positive recurrent. Because i is not positive recurrent, 
neither is j. 
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(ii) An irreducible finite state Markov chain must be positive recurrent. For we know 
that such a chain must be recurrent; hence, all its states are either positive recurrent 
or null recurrent. If they were null recurrent then all the long run proportions 
would equal 0, which is impossible when there are only a finite number of states. 
Consequently, we can conclude that the chain is positive recurrent. ■ 

To determine the long-run proportions {iCj , j Js 1}, note, because 7r; is the long-run 
proportion of transitions that come from state i, that 

m Pj j = long-run proportion of transitions that go from state i to state j 

Summing the preceding over all i now yields that 

31 i = I]- T ' P U 

i 

Indeed, the following important theorem can be proven. 

Theorem 4.1 Consider an irreducible Markov chain. If the chain is positive recurrent 
then the long-run proportions are the unique solution of the equations 

7tj = j > 1 

i 

!>./ = 1 

./ 

Moreover, if there is no solution of the preceding linear equations, then the Markov 
chain is either transient or null recurrent and all it j =0. 

Example 4.20 Consider Example 4.1 , in which we assume that if it rains today, then 
it will rain tomorrow with probability a ; and if it does not rain today, then it will rain 
tomorrow with probability f}. If we say that the state is 0 when it rains and 1 when it 
does not rain, then by Theorem 4.1 the long-run proportions 7r o and 7Ti are given by 

7To = aito -(- f5iT\, 

7t\ = (1 - a): r 0 + (1 - 
TCO + Jfl = 1 


which yields that 


7Tq 



1 — a 
1 + fl — a 


For example if a = 0.7 and ft = 0.4, then the long-run proportion of rain is tto = | = 
0.571. " ■ 


Example 4.21 Consider Example 4.3 in which the mood of an individual is considered 
as a three-state Markov chain having a transition probability matrix 



0.5 

0.4 

0.1 

p = 

0.3 

0.4 

0.3 


0.2 

0.3 

0.5 
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In the long ran, what proportion of time is the process in each of the three states? 

Solution: The long ran proportions i r,-, z =0,1,2, are obtained by solving the set 
of equations in Equation (4.1). In this case these equations are 


7To = 0.5 tTo + 0.37T1 + 0.27T2, 

7t i = OAjtq + 0.4tti + 0.37T2, 

7t2 — O.lTTo + 0.37T1 + 0.57T2, 

TCq + JTl + 7T2 = 1 

Solving yields 

_ _ 21 _ _ 23 _ _ 18 m 

^0 — 62 ’ 771 — 62 ’ 712 — 62 ® 

Example 4.22 (A Model of Class Mobility) A problem of interest to sociologists 
is to determine the proportion of society that has an upper- or lower-class occupation. 
One possible mathematical model would be to assume that transitions between social 
classes of the successive generations in a family can be regarded as transitions of a 
Markov chain. That is, we assume that the occupation of a child depends only on his 
or her parent’s occupation. Let us suppose that such a model is appropriate and that the 
transition probability matrix is given by 



0.45 

0.48 

0.07 

p = 

0.05 

0.70 

0.25 


0.01 

0.50 

0.49 


That is, for instance, we suppose that the child of a middle-class worker will attain 
an upper-, middle-, or lower-class occupation with respective probabilities 0.05, 
0.70, 0.25. 

The long-run proportions tt, thus satisfy 


jtq = 0.45tto + 0.05;ri + 0.01^2* 
7V\ = 0.48tto + 0.70tti + 0.50^2, 
7T2 = 0.07tto + 0.257T1 + 0.49tt2, 
no + JTl + ^2 = 1 


Hence, 

jro = 0.07, jri = 0.62, ji2 — 0.31 

In other words, a society in which social mobility between classes can be described 
by a Markov chain with transition probability matrix given by Equation (4.8) has, in 
the long run, 7 percent of its people in upper-class jobs, 62 percent of its people in 
middle-class jobs, and 31 percent in lower-class jobs. ■ 
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Example 4.23 (The Hardy-Weinberg Law and a Markov Chain in Genetics) 

Consider a large population of individuals, each of whom possesses a particular pair of 
genes, of which each individual gene is classified as being of type A or type a. Assume 
that the proportions of individuals whose gene pairs are A A, aa, or A a are, respec¬ 
tively, po, qo , and ro (po + qo + ro — 1). When two individuals mate, each contributes 
one of his or her genes, chosen at random, to the resultant offspring. Assuming that 
the mating occurs at random, in that each individual is equally likely to mate with any 
other individual, we are interested in determining the proportions of individuals in the 
next generation whose genes are AA, aa, or Aa. Calling these proportions p, q, and r, 
they are easily obtained by focusing attention on an individual of the next generation 
and then determining the probabilities for the gene pair of that individual. 

To begin, note that randomly choosing a parent and then randomly choosing one of 
its genes is equivalent to just randomly choosing a gene from the total gene population. 
By conditioning on the gene pair of the parent, we see that a randomly chosen gene 
will be type A with probability 

P{A} = P{A\AA}po + P{A\aa}qo + P{A\Aa}r o 
= Po + ro/2 

Similarly, it will be type a with probability 
P{a } = qo + ro/2 

Thus, under random mating a randomly chosen member of the next generation will be 
type A A with probability p, where 

p= P{A}P{A} = (p 0 + r 0 /2) 2 

Similarly, the randomly chosen member will be type aa with probability 

q = P{a}P{a } = (qo + r 0 /2) 2 

and will be type Aa with probability 

r = 2 P{A}P{a] = 2 (po + r 0 /2)(q 0 + r 0 /2) 

Since each member of the next generation will independently be of each of the three 
gene types with probabilities p, q, r, it follows that the percentages of the members of 
the next generation that are of type AA, aa, or Aa are respectively p, q, and r. 

If we now consider the total gene pool of this next generation, then p + r/2, the 
fraction of its genes that are A, will be unchanged from the previous generation. This 
follows either by arguing that the total gene pool has not changed from generation to 
generation or by the following simple algebra: 

p + r/ 2= (po + ro/2) 1 + (p 0 + r 0 /2)(q 0 + r 0 /2) 

= (PO + ro/2)[p 0 + ro/2 + qo + r 0 /2] 

— P 0 + ro/2 since po + r 0 + q 0 = 1 
= P{A} 


(4.9) 
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Thus, the fractions of the gene pool that are A and a are the same as in the initial 
generation. From this it follows that, under random mating, in all successive generations 
after the initial one the percentages of the population having gene pairs AA, aa, and 
Act will remain fixed at the values p, q, and r. This is known as the Hardy-Weinberg 
law. 

Suppose now that the gene pair population has stabilized in the percentages p,q, r, 
and let us follow the genetic history of a single individual and her descendants. (For 
simplicity, assume that each individual has exactly one offspring.) So, for a given 
individual, let X n denote the genetic state of her descendant in the nth generation. The 
transition probability matrix of this Markov chain, namely. 



AA 

aa 


Aa 

AA 

r 

P+ 2 

0 


q + 

aa 

0 

r 

q+ 2 


P + 

Aa 

tol'O 

+ 

-1^ 1 -i 

a r 

2 + 4 

P 

2 

+ f 


is easily verified by conditioning on the state of the randomly chosen mate. It is quite 
intuitive (why?) that the limiting probabilities for this Markov chain (which also equal 
the fractions of the individual’s descendants that are in each of the three genetic states) 
should just be p, q, and r. To verify this we must show that they satisfy Theorem (4.1). 
Because one of the equations in Theorem (4.1) is redundant, it suffices to show that 



)+'- 

(K) 


q = q(q +!;) 

+ '•( 

:h) 

=(^d 1 2 3 - 


p + q + r = 1 


But this follows from Equation (4.9), and thus the result is established. ■ 

Example 4.24 Suppose that a production process changes states in accordance 
with an irreducible, positive recurrent Markov chain having transition probabilities 
Pij,iJ = 1and suppose that certain of the states are considered acceptable 
and the remaining unacceptable. Let A denote the acceptable states and A' the unac¬ 
ceptable ones. If the production process is said to be “up” when in an acceptable state 
and “down” when in an unacceptable state, determine 

1. the rate at which the production process goes from up to down (that is, the rate of 
breakdowns); 

2. the average length of time the process remains down when it goes down; and 

3. the average length of time the process remains up when it goes up. 
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Solution: Let jtk, k = \..... n. denote the long-run proportions. Now for i e A 
and j e A c the rate at which the process enters state j from state i is 


rate enter j from i — 7tj Pj / 

and so the rate at which the production process enters state j from an acceptable 
state is 

rate enter j from A = rci Pi j 

ieA 

Hence, the rate at which it enters an unacceptable state from an acceptable one (which 
is the rate at which breakdowns occur) is 

rate breakdowns occur = (4.10) 

jeA c ieA 


Now let U and D denote the average time the process remains up when it goes 
up and down when it goes down. Because there is a single breakdown every U + D 
time units on the average, it follows heuristically that 


1 

rate at which breakdowns occur = —-— 

U + D 

and so from Equation (4.10), 


1 

U + D 


J2 Y. niPi i 

jeA c ieA 


(4.11) 


To obtain a second equation relating U and I). consider the percentage of time the 
process is up, which, of course, is equal to 4 n However, since the process is 
up on the average U out of every U + D time units, it follows (again somewhat 
heuristically) that the 


proportion of up time = —-— 

U + D 


and so 


U 

U + D 


ieA 


Hence, from Equations (4.11) and (4.12) we obtain 

jj _ _ EisA 7r ' _ 

S jeA c 5 ~lieA P U 

f) — ^ ~ EleA 71 ' 

12 jeA c 12ieA 71 i P ij 
_ J2ieA c 

12 jeA c 12ieA lt i p ij 


(4.12) 
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For example, suppose the transition probability matrix is 


1 

1 

1 

0 

4 

4 

2 

0 

1 

1 

1 

4 

2 

4 

1 

1 

1 

1 

4 

4 

4 

4 

1 

1 

0 

1 

4 

4 

2 


where the acceptable (up) states are 1, 2 and the unacceptable (down) ones are 3, 4. 
The long-run proportions satisfy 

= n\\ + tt-s\ +jr 4 i, 

^2 = n\ \ +^2j + X3I +^43, 

7T3 = ^1 j + 7T2 2 + 7T3 1, 

Tt\ + 7T2 + JT3 + 7T4 = 1 


These solve to yield 

_ _ 3 _ _ 1 _ _ 14 _ 13 

711 ~ 16’ ^ 2 — 4 . ^3 — 48 > ^ 4—43 

and thus 


rate of breakdowns = tti (P 13 + P| 4 ) + ^ 2(^23 + P 24 ) 

_ _9. 

— 32 ’ 

U = ^ and D = 2 

Hence, on the average, breakdowns occur about ^ (or 28 percent) of the time. They 
last, on the average, 2 time units, and then there follows a stretch of (on the average) 
Y time units when the system is up. ■ 

The long run proportions tc j ^ 0, are often called stationary probabilities. The 
reason being that if the initial state is chosen according to the probabilities ttj , j Js 0, 
then the probability of being in state j at any time n is also equal to 7 1 j. That is, if 

P{X 0 = j} = 7Tj, J > 0 

then 


P{X n — j} = ttj for all n, j ^ 0 

The preceding is easily proven by induction, for it is true when n = 0 and if we suppose 
it true for n — 1, then writing 

P{X n =j}=J2 P ^ X n = j\ X n-\ = i}P{Xn -1 = 0 

i 

= Pi j iti by the induction hypothesis 

i 

= 7t j 


by Theorem (4.1) 
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Example 4.25 Suppose the numbers of families that check into a hotel on successive 
days are independent Poisson random variables with mean k. Also suppose that the 
number of days that a family stays in the hotel is a geometric random variable with 
parameter p, 0 < p < 1. (Thus, a family who spent the previous night in the hotel 
will, independently of how long they have already spent in the hotel, check out the next 
day with probability p.) Also suppose that all families act independently of each other. 
Under these conditions it is easy to see that if X n denotes the number of families that 
are checked in the hotel at the beginning of day n then { X n , n ^ 0} is a Markov chain. 
Find 

(a) the transition probabilities of this Markov chain; 

(b) E[X n \X 0 = i]- 

(c) the stationary probabilities of this Markov chain. 

Solution: (a) To find I) / , suppose there are i families checked into the hotel at the 
beginning of a day. Because each of these i families will stay for another day with 
probability q = 1 — p it follows that R/ , the number of these families that remain 
another day, is a binomial (i, q) random variable. So, letting N be the number of new 
families that check in that day, we see that 

Pij = P (Rj + N = j) 

Conditioning on R , and using that N is Poisson with mean k, we obtain 


Pij = P(R i + N = j = *) 


k =0 


' ' 9 V"* 


k =0 
min (i,j) 


= Y. p ( N = i- k \ R i = k ){ l k )<i k p i ~ k 


k =0 

min (i,j) 


= P(N = j-k)( l )q k p i - k 


= E 


-X 


kj~ k 


k =0 


U~k)\ \k 


1 ‘ q k p‘~ k 


(b) Using the preceding representation Ri + N for the next state from state i, we see 
that 


E[X n \X n -\ = i] = E[Ri + N] = iq+k 
Consequently, 

E[X n \X n - l ] = X„- iq + k 
Taking expectations of both sides yields 
E[X n ] = k+qE[X n - 1 ] 
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Iterating the preceding gives 


E[X„]=k + qE[X„- 1 ] 

= X + q(X + qE[X n _ 2]) 

= X + qX + q~ E[X n - 2] 

= X + + <7“(A + ^Z?[X„_3]) 

= X + qX + + q^ E[X n — 3] 


showing that 



and yielding the result 


A(1 - q n ) 
£[Z n |X 0 = i] = --— 


P 


(c) To find the stationary probabilities we will not directly use the complicated 
transition probabilities derived in part (a). Rather we will make use of the fact that 
the stationary probability distribution is the only distribution on the initial state that 
results in the next state having the same distribution. Now, suppose that the initial 
state Xq has a Poisson distribution with mean a. That is, assume that the number of 
families initially in the hotel is Poisson with mean a. Let R denote the number of 
these families that remain in the hotel at the beginning of the next day. Then, using 
the result of Example 3.23 that if each of a Poisson distributed (with mean a) number 
of events occurs with probability q, then the total number of these events that occur 
is Poisson distributed with mean aq , it follows that R is a Poisson random variable 
with mean aq. In addition, the number of new families that check in during the day, 
call it N , is Poisson with mean X, and is independent of R. Hence, since the sum 
of independent Poisson random variables is also Poisson distributed, it follows that 
R + N, the number of guests at the beginning of the next day, is Poisson with mean 
X + aq. Consequently, if we choose a so that 

a = X + aq 

then the distribution of Xi would be the same as that of Xq. But this means that 
when the initial distribution of X0 is Poisson with mean a = j y , then so is the distri¬ 
bution ofZj, implying that this is the stationary distribution. That is, the stationary 
probabilities are 


m = e x / p (X/p) 1 /i\, i > 0 


The preceding model has an important generalization. Namely, consider an orga¬ 
nization whose workers are of r distinct types. For instance, the organization could 
be a law firm and its lawyers could either be juniors, associates, or partners. Sup¬ 
pose that a worker who is currently type i will in the next period become type j 
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with probability g, ,■ for j = 1, ..., r or will leave the organization with probability 
1 — Y^j=\ QiJ- I' 1 addition, suppose that new workers are hired each period, and 
that the numbers of types 1 ,,r workers hired are independent Poisson random 
variables with means A.i, ..., X r . If we let X„ = (X„(l),..., X n (r)), where X n (i) 
is the number of type i workers in the organization at the beginning of period n, 
then X„, n ^ 0 is a Markov chain. To compute its stationary probability distribution, 
suppose that the initial state is chosen so that the number of workers of different 
types are independent Poisson random variables, with a, being the mean number 
of type i workers. That is, suppose that Xq( 1),..., Xq( r) are independent Poisson 
random variables with respective means ai,... ,a r . Also, let Nj, j = 1 ,,r, be 
the number of new type j workers hired during the initial period. Now, fix i. and for 
j = let M, (j) be the number of the Xq (/) type i workers who become type j 

in the next period. Then because each of the Poisson number Xo(f) of type i workers 
will independently become type j with probability qij. j = l,..., r, it follows from 
the remarks following Example 3.23 that M, ( 1),..., M, (r) are independent Pois¬ 
son random variables with M/(y) having mean aiqij. Because Xo(l), ..., Xo(r) 
are, by assumption, independent, we can also conclude that the random variables 
M, (j), i, j — arc all independent. Because the sum of independent Poisson 

random variables is also Poisson distributed, the preceding yields that the random 
variables 

r 

X](j) = Nj + J2 M ‘(J)> j = 

i= 1 

are independent Poisson random variables with means 

r 

E[Xi(j)] = Xj + Y^onqij 

i=i 

Hence, if ai,... ,a r satisfied 

r 

(Xj = + ^2, a ‘ qi ’j ’ j = E...,r 

;=l 

then Xi would have the same distribution as Xo. Consequently, if we let a °,..., a° 
be such that 

r 

a °j = X J + J2 a i q ‘J’ J = !- ' ’ ' ’ r 

i-l 

then the stationary distribution of the Markov chain is the distribution that takes the 
number of workers in each type to be independent Poisson random variables with 
means a °,..., a°. That is, the long run proportions are 

r 

Xk 1 ,...,k r = \[ e ~ a hoi°) ki /kj\ 

1 = 1 
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It can be shown that there will be such values a. 0 ., j = 1,..., r, provided that, with 
probability 1, each worker eventually leaves the organization. Also, because there is 
a unique stationary distribution, there can only be one such set of values. ■ 

The following example exploits the relationship m; = 1 /tt,- , which states that the mean 
time between visits to a state is the inverse of the the long run proportion of time the 
chain is in that state, to obtain a method for computing the mean time until a specified 
pattern appears when the data constitutes the successive states of a Markov chain. 

Example 4.26 (Mean Pattern Times in Markov Chain Generated Data) Consider 
an irreducible Markov chain {X n , n ^ 0} with transition probabilities P, j and station¬ 
ary probabilities jtj, j ^ 0. Starting in state r, we are interested in determining the 
expected number of transitions until the pattern ■ ■ ■, i k appears. That is, with 

N(i i, h, ...,ik) = min{» ^ k: X n - k +\ = h,...,X n = i k ] 

we are interested in 


E[N(i\,h, ■ ■■, ik)\X 0 = r] 

Note that even if i\ — r, the initial state Xq is not considered part of the pattern 
sequence. 

Let /i,(i, i \) be the mean number of transitions for the chain to enter state i \, given 
that the initial state is i, i ^ 0. The quantities /x(i, i i) can be determined as the solution 
of the following set of equations, obtained by conditioning on the first transition out of 
state i: 

jx{i, i\) = 1 + ^2 Pijnij, i i), i ^ 0 
i¥=h 

For the Markov chain { X n , n ^ 0} associate a corresponding Markov chain, which we 
will refer to as the fc-chain, whose state at any time is the sequence of the most recent 
k states of the original chain. (For instance, if A: = 3 and Xj = 4. A 3 = l, X 4 = 1, 
then the state of the fc-chain at time 4 is (4, 1, 1).) Let tt( j\, ..., j k ) be the stationary 
probabilities for the A:-chain. Because rc( j \, ..., j k ) is the proportion of time that the 
state of the original Markov chain k units ago was j\ and the following k — I states, in 
sequence, were 72, ..., jk, we can conclude that 

n (jl’ ■ ■ ■ , jk) — Xji Pji,j 2 ' ' ' Pjk-i.jk 

Moreover, because the mean number of transitions between successive visits of the 
A'-chain to the state i \, h, ...,/* is equal to the inverse of the stationary probability of 
that state, we have that 


L[number of transitions between visits to / 1 , 12 ,..., i k ] 

1 

7T(ii, i k ) 


(4.13) 


Let A(ii,, i m ) be the additional number of transitions needed until the pattern 
appears, given that the first m transitions have taken the chain into states X \ = 
ft,..., X — i m . 
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We will now consider whether the pattern has overlaps, where we say that the pattern 
/ 1 , / 2 , ..., ik has an overlap of size j,j < k, if the sequence of its final j elements is 
the same as that of its first j elements. That is, it has an overlap of size j if 

(ik-j+i= j < k 

Case 1 The pattern i\, i 2 ,..., ik has no overlaps. 

Because there is no overlap, Equation (4.13) yields 

E[N(ii, i 2 , ■ ■ ■, ik)\Xo = ik] = ——-— 

7T(ll, ..., 1 k) 

Because the time until the pattern occurs is equal to the time until the chain enters state 
i i plus the additional time, we may write 

E[N(ii, i 2 ,..., ik)\X 0 = ik] = H(ik, ii) + E[A(ii)] 

The preceding two equations imply 

1 

E[A{h)] = —-— - fi(!k, i t) 

Jt(n,..., 1 k) 

Using that 

E[N(iu i 2 , ■■■, ik)\Xo = r] = ji{r, i|) + E[A(i\)] 


gives the result 


1 

E[N (ti, i 2 , ■ ■ ■, ik)\X 0 = r] = n(r, i\) H---— - fi(ik, n) 

7T(ii, ..., Ik) 


where 


7T(/i, . . . , ik) — Ei\ ,i2 ' ' ‘ Pik_i,ik 


Case 2 Now suppose that the pattern has overlaps and let its largest overlap be of 
size s. In this case the number of transitions between successive visits of the A:-chain 

to the state * i, i 2 . ik is equal to the additional number of transitions of the original 

chain until the pattern appears given that it has already made s transitions with the 
results X\ = i\,..., X s = i s . Therefore, from Equation (4.13) 

E[A(ii, ..., U)] = —— -— 

7t(n, ...,ik) 


But because 


N(i i, i 2 , ...,ik) = N(ii, ...,i s ) + A(i\, ..., i s ) 
we have 


1 


E[N(i\,i 2 ,..., ik)\X 0 = r] = E[N(i\,i 2 , ..., tJ|X 0 = r] + 


n(i\,..., ik) 
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We can now repeat the same procedure on the pattern i \,..., i s , continuing to do so 
until we reach one that has no overlap, and then apply the result from Case 1. 

For instance, suppose the desired pattern is 1,2, 3, 1, 2, 3, 1, 2. Then 

£[V(1, 2, 3, 1, 2, 3, 1, 2)|X 0 = r] = E[N{\, 2, 3, 1, 2)|* 0 = r] 

1 

+ jr(l,2,3, 1,2,3, 1,2) 

Because the largest overlap of the pattern (1, 2, 3, 1,2) is of size 2, the same argument 
as in the preceding gives 


£[V(1, 2, 3, 1, 2)|*o = r] = E[N( 1,2)|* 0 = r] + - 

tt(1, 2, 3, 1 

Because the pattern (1, 2) has no overlap, we obtain from Case 1 that 

E[N (1, 2)|*o = r] = n(r, 1) + —'— - /i(2, 1) 

7r(l, 2) 

Putting it together yields 


£[*(1, 2, 3, 1, 2, 3, 1,2)|*o = r] = fi(r, 1) + —- /i{ 2, 1) 

7T| P\,2 

1 1 

3 ^ 3,1 Xl P\ y 2 ^* 2 , 3 ^ 3,1 

If the generated data is a sequence of independent and identically distributed ran¬ 
dom variables, with each value equal to j with probability P, , then the Markov chain 
has Pij = Pj. In this case, :Xj = Pj. Also, because the time to go from state i to 
state j is a geometric random variable with parameter Pj, we have fi(i, j ) = 1 /Pj. 
Thus, the expected number of data values that need be generated before the pattern 
1, 2, 3, 1, 2, 3, 1, 2 appears would be 

11 1 1 | 1 
T\ + Pi Pi ~ pi + + pfpjpi 

_ ~p\Pi + p}p\pi + p\p\pI " 


The following result is quite useful. 

Proposition 4.6 Let {*„, n ^ 1} be an irreducible Markov chain with stationary 
probabilities itj , j ^ 0, and let r be a bounded function on the state space. Then, with 
probability 1, 


lim 

iV—>-oo 


Ell r(X n ) 


oo 


J2 r( j }7T J 

j =0 


N 
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Proof. If we let aj (N) be the amount of time the Markov chain spends in state j 
during time periods 1, ..., N, then 

N OO 

J2 r (*n) = J2 a j( N)r O') 

n= 1 7=0 

Since a j(N)/N —> txj the result follows from the preceding upon dividing by N and 
then letting N —> oo. ■ 

If we suppose that we earn a reward r(j) whenever the chain is in state j, then 
Proposition 4.6 states that our average reward per unit time is ■ r(j)jtj. 

Example 4.27 For the four state Bonus Malus automobile insurance system specified 
in Example 4.7, find the average annual premium paid by a policyholder whose yearly 
number of claims is a Poisson random variable with mean 1 /2. 

Solution: With ak = e~ , we have 

ao — 0.6065, ci\ = 0.3033, «2 = 0.0758 

Therefore, the Markov chain of successive states has the following transition prob¬ 
ability matrix: 


0.6065 

0.3033 

0.0758 

0.0144 

0.6065 

0.0000 

0.3033 

0.0902 

0.0000 

0.6065 

0.0000 

0.3935 

0.0000 

0.0000 

0.6065 

0.3935 


The stationary probabilities are given as the solution of 

7t\ = 0.60657T1 + 0.60657T2, 

7X2 = 0.30337ri + 0.60657T3, 

7T 3 = 0.07587ri + 0.3033^2 + 0.6065^4, 

7X\ 7X2 7X2, + 7X4 — 1 

Rewriting the first three of these equations gives 
1 - 0.6065 

= - 7 Tl , 

0.6065 

7X2 — 0.30337T1 
0.6065 ’ 

txt, — 0 . 07587 T 1 — 0.30 3 3772 
_ 0.6065 

= 0.6488tti, 

= 0.5697tti, 

= 0.4900tti 


7T2 

JT3 

7T4 

7T2 

^3 

TCa 
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Using that Ylt=l 71 i = 1 gives the solution (rounded to four decimal places) 
ttj = 0.3692, jt2 = 0.2395, tt 3 = 0.2103, tt 4 = 0.1809 
Therefore, the average annual premium paid is 

200jti + 2507T2 + 400tt 3 + 600jr 4 = 326.375 ■ 


4 . 4.7 Limiting Probabilities 

In Example 4.1 we considered a two-state Markov chain with transition probability 
matrix 


P = 




and showed that 

(4 ) _ /0.5749 0.4251\ 
V “^0.5638 0.4332,1 


From this it follows that P*" 8 - 1 = P (4) • P (4) is given (to three significant places) by 

(8) /0.572 0.428\ 

\0.510 0.430 ) 


Note that the matrix P (8) is almost identical to the matrix P (4) , and that each of the rows 
ofP (8) has almost identical values. Indeed, it seems that //” is converging to some value 
as n —> 00 , with this value not depending on i. Moreover, in Example 4.20 we showed 
that the long-run proportions for this chain are icq = 4/7 *=» .571, n\ = 3/7 ^ .429, 
thus making it appear that these long-run proportions may also be limiting probabilities. 
Although this is indeed the case for the preceding chain, it is not always true that the 
long-run proportions are also limiting probabilities. To see why not, consider a two-state 
Markov chain having 


Po.i = Pl.O = 1 

Because this Markov chain continually alternates between states 0 and 1, the long-run 
proportions of time it spends in these states are 


TTo = 7Tl = 1/2 


However, 


pi> _ 

r o,o ~ 


1 , 

0 , 


if n is even 
if n is odd 


and so Pq q does not have a limiting value as n goes to infinity. In general, a chain that 
can only return to a state in a multiple of d > 1 steps (where cl = 2 in the preceding 
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example) is said to be periodic and does not have limiting probabilities. However, 
for an irreducible chain that is not periodic, and such chains are called aperiodic, 
the limiting probabilities will always exist and will not depend on the initial state. 
Moreover, the limiting probability that the chain will be in state j will equal itj, the 
long-run proportion of time the chain is in state j. That the limiting probabilities, when 
they exist, will equal the long-run proportions can be seen by letting 

aj = lim P(X„ = j ) 

n —>00 

and using that 


oo oo 

P(X n + l =j) = J2 P(X„ +1 = j\X n = i) P(X n = i) = J2 PijP&n = 0 

i =0 i =0 

and 


1 = p ( x n = i ) 
!= 0 


Letting n —>■ oo in the preceding two equations yields, upon assuming that we can 
bring the limit inside the summation, that 

OO 

y, a i Pi j 
1=0 
OO 

J2 ai 

i =0 

Hence, \aj, j Js 0} satisfies the equations for which {it j, j fs 0} is the unique solution, 
showing that a j = itj, j ^0. 


aj = 

1 = 


4.5 Some Applications 

4.5.1 The Gambler's Ruin Problem 

Consider a gambler who at each play of the game has probability p of winning one unit 
and probability q = I — p of losing one unit. Assuming that successive plays of the 
game are independent, what is the probability that, starting with i units, the gambler’s 
fortune will reach N before reaching 0? 

If we let X n denote the player’s fortune at time n, then the process { X„, n = 
0, 1, 2,...} is a Markov chain with transition probabilities 

Poo = Pnn = 1, 

Pi,i+ 1 = P = 1 - Pi,i- 1, i' = l, 2,..., N - 1 




Markov Chains 


221 


This Markov chain has three classes, namely, {0}, {1, 2,. .., N — 1}, and { N ); the first 
and third class being recurrent and the second transient. Since each transient state is 
visited only finitely often, it follows that, after some finite amount of time, the gambler 
will either attain his goal of N or go broke. 

Let Pi, i =0.1 ./V, denote the probability that, starting with i. the gambler’s 

fortune will eventually reach N. By conditioning on the outcome of the initial play of 
the game we obtain 

Pi = pPi+i +qPi- 1, i = 1, 2, ..., N — 1 
or equivalently, since p + q — 1 , 
pPi +qPj = pP i+ i +qP ,-1 


or 

P i+ i - Pi = ^(Pt - Pi-0, i = 1, 2,..., N - 1 
P 

Hence, since Pq = 0, we obtain from the preceding line that 

Pl-P\ = -(Pl-P 0 )= -Pi, 

p p 

Pi- Pl = -(P2- Pl)= Pu 

P \PJ 

Pi - Pi-! = ( ‘ (Pi-! - Pi-2 ) = (-) P U 
P \PJ 


Pn ~ Pn -i 

Adding the first i 


(§) <p »-' - p "-» =(§r‘ ^ 

1 of these equations yields 


Pi 


P 1 = P\ 



or 


1 - (q/py 
i - (q/p) 1 

iPu 


Now, using the fact that / J y 



if« / l 
P 

ifi = l 


1, we obtain 



i -1 


Pi 


i - (q/p) 

1 ~(q/p) N ’ 

1 

N' 


if P/ 


if P = 


1 

2 

1 

2 
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and hence 


(q/py 

if P 7 ^ 

Pi = 

1 - 

(' q/p) N ’ 


l 

n’ 


if P = 

Note that, as N - 

-> 00 , 


Pi -> 

• 

1 - 

(5)' 

if p > 


0 , 


£+3 

^3 

/A 


(4.14) 


Thus, if p >\ , there is a positive probability that the gambler’s fortune will increase 
indefinitely; while if p ^ the gambler will, with probability 1, go broke against an 
infinitely rich adversary. 

Example 4.28 Suppose Max and Patty decide to flip pennies; the one coming closest 
to the wall wins. Patty, being the better player, has a probability 0.6 of winning on each 
flip, (a) If Patty starts with five pennies and Max with ten, what is the probability that 
Patty will wipe Max out? (b) What if Patty starts with 10 and Max with 20? 


Solution: (a) The desired probability is obtained from Equation (4.14) by letting 
i = 5, N = 15, and p = 0.6. Hence, the desired probability is 


MS 

Mi) 


5 


15 


0.87 


(b) The desired probability is 


Ml) 

Mi) 


10 


30 


0.98 


For an application of the gambler’s ruin problem to drug testing, suppose that two 
new drugs have been developed for treating a certain disease. Drug i has a cure rate 
Pi, i = 1, 2, in the sense that each patient treated with drug i will be cured with 
probability P ,. These cure rates, however, are not known, and suppose we are interested 
in a method for deciding whether Pi > P 2 or P 2 > P\. To decide upon one of these 
alternatives, consider the following test: Pairs of patients are treated sequentially with 
one member of the pair receiving drug 1 and the other drug 2. The results for each pair 
are determined, and the testing stops when the cumulative number of cures using one 
of the drugs exceeds the cumulative number of cures when using the other by some 
fixed predetermined number. More formally, let 


*j = 

Yj = 


1 , 

0 , 

1 , 

0 , 


if the patient in the j th pair to receive drug number 1 is cured 
otherwise 

if the patient in the /th pair to receive drug number 2 is cured 
otherwise 
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For a predetermined positive integer M the test stops after pair N where N is the 
first value of n such that either 

Xi + • • • + X n — (7i + • • • + Y n ) = M 
or 

Xi + ■ ■ ■ + X n - (Fj + • • • + Y n ) = -M 

In the former case we then assert that P\ > Pi, and in the latter that Pj > Pi- 

In order to help ascertain whether the preceding is a good test, one thing we would 
like to know is the probability of it leading to an incorrect decision. That is, for given 
Pi and P 2 where P\ > P 2 , what is the probability that the test will incorrectly assert 
that Pi > Pi ? To determine this probability, note that after each pair is checked the 
cumulative difference of cures using drug 1 versus drug 2 will either go up by 1 with 
probability Pi (1 — P 2 )—since this is the probability that drug 1 leads to a cure and 
drug 2 does not—or go down by 1 with probability (1 — Pi) P 2 , or remain the same with 
probability Pi P 2 + (1 — Pi) (1 — P 2 ). Hence, if we only consider those pairs in which 
the cumulative difference changes, then the difference will go up 1 with probability 

p = P{up l|up 1 or down 1} 

= Pia-Pi) 

Fi(1-P 2 ) + (1-Pi)P 2 


and down 1 with probability 

= 1 - = ~ fl) 

q P Fi(1-P 2 ) + (1-Pi)P 2 

Hence, the probability that the test will assert that P 2 > Pi is equal to the probability 
that a gambler who wins each (one unit) bet with probability p will go down M before 
going up M. But Equation (4.14) with i = M, N = 2 M, shows that this probability is 
given by 

1 — (q / p) M 

Pftest asserts that P 2 > Pi} = 1 - -- / . 

1 - ( q/p)- M 

1 

“ 1+ (p/q) M 

Thus, for instance, if P\ = 0.6 and P 2 = 0.4 then the probability of an incorrect deci¬ 
sion is 0.017 when M = 5 and reduces to 0.0003 when M = 10. 

4.5.2 A Model for Algorithmic Efficiency 

The following optimization problem is called a linear program: 

minimize cx, 
subject to Ax = b, 
x > 0 
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where A is an m x n matrix of fixed constants; c = {c\,..., c n ) and b = (b\, , b m ) 

are vectors of fixed constants; and x = (x\, , x n ) is the n-vector of nonnegative 

values that is to be chosen to minimize cx = ^" =1 c i x i- Supposing that n > m, it can 
be shown that the optimal x can always be chosen to have at least n — m components 
equal to 0—that is, it can always be taken to be one of the so-called extreme points of 
the feasibility region. 

The simplex algorithm solves this linear program by moving from an extreme point 
of the feasibility region to a better (in terms of the objective function cx) extreme point 
(via the pivot operation) until the optimal is reached. Because there can be as many 

as N = ( " ) such extreme points, it would seem that this method might take many 

iterations, but, surprisingly to some, this does not appear to be the case in practice. 

To obtain a feel for whether or not the preceding statement is surprising, let us 
consider a simple probabilistic (Markov chain) model as to how the algorithm moves 
along the extreme points. Specifically, we will suppose that if at any time the algorithm 
is at the / th best extreme point then after the next pivot the resulting extreme point is 
equally likely to be any of the j — 1 best. Under this assumption, we show that the time 
to get from the ATh best to the best extreme point has approximately, for large N, a 
normal distribution with mean and variance equal to the logarithm (base e) of N. 
Consider a Markov chain for which Pi i = 1 and 

1 

Pi i = -, j = 1 ,..., i — 1, i > 1 

1 i -1 J 

and let 7} denote the number of transitions needed to go from state i to state 1. A 
recursive formula for E[Tj] can be obtained by conditioning on the initial transition: 

1 

E[Ti]= 1 + -- tYe[Tj] 

i — 1 ' 

7=1 

Starting with P[7j] = 0, we successively see that 

E[T 2 \ = 1, 

E[T 3 ] = 1+i, 

P[P 4 ] = 1+I(1 + 1 + I) = 1 + I + I 
and it is not difficult to guess and then prove inductively that 


i—1 

E[Ti] = £ 1/7 
7=1 

However, to obtain a more complete description of 7#, we will use the representation 


N -1 

Tn = E h 


7=1 
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where 

h 


1, if the process ever enters j 
0, otherwise 


The importance of the preceding representation stems from the following: 
Proposition 4.7 I\,, /y_i are independent and 


P[Ij = 1 } = 1 / 7 , l^j^N-l 

Proof. Given Ij+i, ..., In, let n = min{/: i > j, h = 1} denote the lowest num¬ 
bered state, greater than j, that is entered. Thus we know that the process enters state n 
and the next state entered is one of the states 1,2 ,, j. Hence, as the next state from 
state n is equally likely to be any of the lower number states 1 , 2 , ..., n — 1 we see that 

l/(n - 1) 

puj = i\i J+ u ...,i N } = = yj 

Hence, P{Ij = 1} = l/j, and independence follows since the preceding conditional 
probability does not depend on 7/+i ,.... In- ■ 

Corollary 4.8 

(i) E[T N i = l/j. 

(ii) Var(T^) = (l/y)(l - l/j). 

(iii) For N large, Tn has approximately a normal distribution with mean log N and 
variance log N. 

Proof. Parts (i) and (ii) follow from Proposition 4.7 and the representation Tn = 
Y) ’^i 1 I j. Part (iii) follows from the central limit theorem since 


/ 


N 


dx 

x 


N-l r N -1 


dx 

x 


or 


N-l 

logiV < l /j < 1 + log (TV — 1) 
l 


and so 


log N 


N-l 


£1/7 


Returning to the simplex algorithm, if we assume that n. m, and n — m are all large, 
we have by Stirling’s approximation that 


„« + t/2 


N = 


m 


(n — m) n »+l/2 m m+l/2^ 
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and so, letting c = n /m , 

log N ~ (me + log (me) — ( m(c — 1) + ^) log (m(c — 1)) 
— (m + j) log m - j log (27r) 


or 


log A? 


clog 



+ log(c- 1) 


Now, as lini i^To x log[x/(x — 1)] = 1, it follows that, when c is large, 
log N ~ m[l + log (c — 1)] 

Thus, for instance, if n = 8000, m = 1000, then the number of necessary transitions is 
approximately normally distributed with mean and variance equal to 1000(1 +log 7) 
3000. Hence, the number of necessary transitions would be roughly between 

3000 ± 2V3000 or roughly 3000 ±110 

95 percent of the time. 

4.5.3 Using a Random Walk to Analyze a Probabilistic Algorithm for the 
Satisfiability Problem 

Consider a Markov chain with states 0, 1, ... ,n having 

Po.i = 1, Pi,i+ 1 = P, PiJ -1 =? = 1 - p, 1 < i < n 

and suppose that we are interested in studying the time that it takes for the chain to go 
from state 0 to state n. One approach to obtaining the mean time to reach state n would 
be to let m,- denote the mean time to go from state i to state n, i = 0,..., n — 1. If we 
then condition on the initial transition, we obtain the following set of equations: 


7770 = 1 + /Ml, 

//I; = A’[time to reach 71 1next state is i + I ]p 
+£[time to reach n \next state is i — 1 ]q 
= (1 ± m i+ i)p + (1 + mt-i)q 
= 1 + pmi+i + qnij-i, i = 1, ..., 77 — 1 

Whereas the preceding equations can be solved for 777, , i = 0,.... 77 — 1, we do not 
pursue their solution; we instead make use of the special structure of the Markov chain 
to obtain a simpler set of equations. To start, let Nj denote the number of additional 
transitions that it takes the chain when it first enters state i until it enters state i ± 1. By 
the Markovian property, it follows that these random variables Ni,i = 0, ..., n — 1 
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are independent. Also, we can express No n , the number of transitions that it takes the 
chain to go from state 0 to state n , as 


n -1 



(4.15) 


Letting Hi = E[N ,■] we obtain, upon conditioning on the next transition after the chain 
enters state i , that for i = 11 

Hi = 1 + /-’[number of additional transitions to reach i + 1 |chain to i — 1 ]q 

Now, if the chain next enters state i — 1, then in order for it to reach i + 1 it must first 
return to state i and must then go from state i to state i + 1. Hence, we have from the 
preceding that 


Hi = 1 + E[N*_ x + N*]q 


where /V*_, and N* are, respectively, the additional number of transitions to return 
to state i from i — 1 and the number to then go from / to i + 1. Now, it follows 
from the Markovian property that these random variables have, respectively, the same 
distributions as /V, _i and Nj. In addition, they are independent (although we will only 
use this when we compute the variance of No n ). Hence, we see that 

lii = 1 +q{jXi -1 + Hi) 


or 


1 q 

Hi = - + - Hi-l, 1 = 1 - , n - 1 

P P 


Starting with ho = 1> an d letting a = q/p, we obtain from the preceding recursion 
that 

Hi = 1 IP + 


H2= l/p + a(l/p + a) = 1 /p + a/p + a 2 , 
H3 = 1/p + a(l/p + a/p + a 2 ) 


— 1/p + a/p + a 2 / p + a 3 


In general, we see that 



(4.16) 


Using Equation (4.15), we now get 



n —1 i —1 


n —1 
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When p = and so a — 1, we see from the preceding that 
E[Nq , n ] = 1 + (n — 1 )n + n — 1 = n 2 
When p ^ we obtain 


£[M),,i] = 1 + 


1 


n — 1 


Pi 1 -«) 


E a-“'■)■ 


i=i 


1 — a 


= 1 + 


1 + a 


1 — a 



(a — a") 
1 — a 


= 1 + 


2 a" +l — in + l)a 2 + n- 1 
(1 — a ) 2 


a — a n 
1 — a 


where the second equality used the fact that p = 1/(1+a). Therefore, we see that when 
a > 1, or equivalently when p < 2 , the expected number of transitions to reach n is an 
exponentially increasing function of n. On the other hand, when p = T’IWo.h] = « 2 , 
and when p > 2 , is, for large n, essentially linear in n. 

Let us now compute Var(Mi,/i)- To do so, we will again make use of the represen¬ 
tation given by Equation (4.15). Letting i;,- = Var(/V ; ), we start by determining the v, 
recursively by using the conditional variance formula. Let 5, = 1 if the first transition 
out of state i is into state i + 1, and let St = — 1 if the transition is into state i — 1, 
i = — 1. Then, 


given that S) = 1: /V, = 1 

given that Si = — 1: /V; = 1 + N*_ 1 + N* 

Hence, 


E\Ni\Si = 1]= 1, 

E[N, | Si = — 1] = 1 + p.i -1 + pi 


implying that 

Var(£[^ i |5 i ]) = Var(£[^|5 ; ] - 1) 

= ipi-l + Pi) 2 q — ( Pi-1 + Pi ) 2 q 2 

= qpipi-i + pi) 2 

Also, since N*_ { and N*, the numbers of transitions to return from state i — 1 to i 
and to then go from state i to state i + 1 are, by the Markovian property, independent 
random variables having the same distributions as /V,'-i and A;, respectively, we see 
that 


Var(A,|$ = 1) = 0, 
Var(A;|Si = -1) = \ + v t 
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Hence, 


E\Vai(Ni\Si)] = q(Vi -1 + Vi) 


From the conditional variance formula, we thus obtain 


Vi = pq(m-\ + /x /) 2 + q(Vi-i + i>,) 


or, equivalently, 


Vi = q(pi -1 + Mi) 2 + <*Vi- 1 , i = 1, ..., n - 1 

Starting with vq = 0, we obtain from the preceding recursion that 

Vi = q(fi o + Ml) 2 , 

V2 = ?(M 1 + M2) 2 + <*?(M0 + Ml) 2 , 

V3 = <?(M 2 + M3) 2 + «<?(mi + M 2 ) 2 + or<7(M 0 + Ml) 2 

In general, we have for i > 0, 


v i =q'Y^u' j (m y —1 + Mf) 2 (4-17) 

7=1 

Therefore, we see that 

«— 1 72 — 1 i 

Var(7V 0 ,„) = E v '- = «EE aij (M/-1 + M;) 2 

i'=0 i = l ./=! 

where M; is given by Equation (4.16). 

We see from Equations (4.16) and (4.17) that when p ^ j, and so a ^ 1, that m; 
and o;, the mean and variance of the number of transitions to go from state i to i + 1, 
do not increase too rapidly in i. For instance, when p = \ it follows from Equations 
(4.16) and (4.17) that 


Mi = 2i + 1 


and 


oi = ^Ew) 2 = 8 E7 2 

3 = 1 3 = 1 

Hence, since No^ n is the sum of independent random variables, which are of roughly 
similar magnitudes when p ^ 7, it follows in this case from the central limit theorem 
that /Vo ,, is, for large n, approximately normally distributed. In particular, when p — 
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2 , N()j, is approximately normal with mean n 2 and variance 

n —1 i 

Var(A' 0i „) = 8 ££r 

i=l y=l 
n—1n -1 

= *££/ 

;'=i 

7=1 

r ,_1 2 

^ 8 J (n — x)x dx 


Example 4.29 (The Satisfiability Problem) A Boolean variable x is one that takes 
on either of two values: TRUE or FALSE. If x;, i A 1 are Boolean variables, then a 
Boolean clause of the form 


*1 + X 2 + X 3 


is TRUE if x\ is TRUE, or if X 2 is FALSE, or if x 3 is TRUE. That is, the symbol “+” 
means “or” and x is TRUE if x is FALSE and vice versa. A Boolean formula is a 
combination of clauses such as 

(Xl + X 2 ) * (Xl + X 3 ) * (X2 + X 3 ) * (X1 + X 2 ) * (Xl + X 2 ) 

In the preceding, the terms between the parentheses represent clauses, and the formula 
is TRUE if all the clauses are TRUE, and is FALSE otherwise. For a given Boolean 
formula, the satisfiability problem is either to determine values for the variables that 
result in the formula being TRUE, or to determine that the formula is never true. For 
instance, one set of values that makes the preceding formula TRUE is to set xi = 
TRUE, X 2 = FALSE, and x 3 = FALSE. 

Consider a formula of the n Boolean variables xi, ..., x n and suppose that each 
clause in this formula refers to exactly two variables. We will now present a probabilistic 
algorithm that will either find values that satisfy the formula or determine to a high 
probability that it is not possible to satisfy it. To begin, start with an arbitrary setting 
of values. Then, at each stage choose a clause whose value is FALSE, and randomly 
choose one of the Boolean variables in that clause and change its value. That is, if the 
variable has value TRUE then change its value to FALSE, and vice versa. If this new 
setting makes the formula TRUE then stop, otherwise continue in the same fashion. 

If you have not stopped after n 2 ( 1 + 4^/|) repetitions, then declare that the formula 
cannot be satisfied. We will now argue that if there is a satisfiable assignment then this 
algorithm will find such an assignment with a probability very close to 1. 
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Let us start by assuming that there is a satisfiable assignment of truth values and let 
<// be such an assignment. At each stage of the algorithm there is a certain assignment 
of values. Let Yj denote the number of the n variables whose values at the y'th stage of 
the algorithm agree with their values in ,</Y. For instance, suppose that n = 3 and sY 
consists of the settings xi = X 2 = *3 = TRUE. If the assignment of values at the jth 
step of the algorithm is xi — TRUE, xj — X 3 = FALSE, then Yj = 1. Now, at each 
stage, the algorithm considers a clause that is not satisfied, thus implying that at least 
one of the values of the two variables in this clause does not agree with its value in ,&/. 
As a result, when we randomly choose one of the variables in this clause then there 
is a probability of at least ^ that Y j+ \ = Y/ + 1 and at most \ that Yj+ 1 = Yj — 1. 
That is, independent of what has previously transpired in the algorithm, at each stage 
the number of settings in agreement with those in sY will either increase or decrease 
by 1 and the probability of an increase is at least ^ (it is 1 if both variables have values 
different from their values in .(•/). Thus, even though the process Yj , j ^ 0 is not itself a 
Markov chain (why not?) it is intuitively clear that both the expectation and the variance 
of the number of stages of the algorithm needed to obtain the values of ,<// will be less 
than or equal to the expectation and variance of the number of transitions to go from 
state 0 to state n in the Markov chain of Section 4.5.2. Hence, if the algorithm has 
not yet terminated because it found a set of satisfiable values different from that of 
sY , it will do so within an expected time of at most n 2 and with a standard deviation 

of at most « 2 a/§- in addition, since the time for the Markov chain to go from 0 to 
n is approximately normal when n is large we can be quite certain that a satisfiable 
assignment will be reached by n 2 + 4(n 2 J^) stages, and thus if one has not been 
found by this number of stages of the algorithm we can be quite certain that there is no 
satisfiable assignment. 

Our analysis also makes it clear why we assumed that there are only two variables 
in each clause. For if there were k, k > 2, variables in a clause then as any clause that is 
not presently satisfied may have only one incorrect setting, a randomly chosen variable 
whose value is changed might only increase the number of values in agreement with 
sY with probability 1 /k and so we could only conclude from our prior Markov chain 
results that the mean time to obtain the values in sY is an exponential function of n, 
which is not an efficient algorithm when n is large. ■ 


4.6 Mean Time Spent in Transient States 

Consider now a finite state Markov chain and suppose that the states are numbered so 
that T = {1,2 ,,t} denotes the set of transient states. Let 


p 11 

Pi 2 ■ ' 

• Pit 

Pt 1 

Pa •• 

• Pt, 


P T = 
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and note that since I’/ specifies only the transition probabilities from transient states 
into transient states, some of its row sums are less than 1 (otherwise, T would be a 
closed class of states). 

For transient states i and j , let s ,j denote the expected number of time periods that 
the Markov chain is in state j, given that it starts in state i. Let 8jj = 1 when i = j 
and let it be 0 otherwise. Condition on the initial transition to obtain 


s ij — &i,j + PikS kj 

k 

t 

= 8ij+J2 p ikSkj (4.18) 

k= 1 

where the final equality follows since it is impossible to go from a recurrent to a transient 
state, implying that sy = 0 when k is a recurrent state. 

Let S denote the matrix of values sij ,i,j— 1,...,f. That is, 


S = 


■Sll -S12 


Sit 


S /1 Sf2 ■ ■ ■ Sft 

In matrix notation, Equation (4. 1 8) can be written as 


S = I + Pt-S 


where I is the identity matrix of size t. Because the preceding equation is equivalent to 

(I - P r )S = I 

we obtain, upon multiplying both sides by (I — P 7 -) -1 , 

S = (I - Pr ) -1 


That is, the quantities sij , i e T, j e T, can be obtained by inverting the matrix I P/ . 
(The existence of the inverse is easily established.) 

Example 4.30 Consider the gambler’s ruin problem with p = 0.4 and N = 7. 
Starting with 3 units, determine 

(a) the expected amount of time the gambler has 5 units, 

(b) the expected amount of time the gambler has 2 units. 

Solution: The matrix P 7 ', which specifies Pjj,i,j e {1, 2, 3, 4, 5, 6 }, is as 
follows: 



1 

2 

3 

4 

5 

6 

1 

0 

0.4 

0 

0 

0 

0 

2 

0.6 

0 

0.4 

0 

0 

0 

3 

0 

0.6 

0 

0.4 

0 

0 

4 

0 

0 

0.6 

0 

0.4 

0 

5 

0 

0 

0 

0.6 

0 

0.4 

6 

0 

0 

0 

0 

0.6 

0 


Pr = 
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Inverting I — P7- gives 


S = (I-Prr 1 

Hence, 

■S3,5 = 0.9228, 


1.6149 

1.0248 

0.6314 

1.5372 

2.5619 

1.5784 

1.4206 

2.3677 

2.9990 

1.2458 

2.0763 

2.6299 

0.9835 

1.6391 

2.0763 

0.5901 

0.9835 

1.2458 

53,2 = 

2.3677 



0.3691 

0.1943 

0.0777 

0.9228 

0.4857 

0.1943 

1.7533 

0.9228 

0.3691 

2.9990 

1.5784 

0.6314 

2.3677 

2.5619 

1.0248 

1.4206 

1.5372 

1.6149 


For i e T, j e T, the quantity fa, equal to the probability that the Markov chain 
ever makes a transition into state j given that it starts in state i , is easily determined 
from P7. To determine the relationship, let us start by deriving an expression for Sij by 
conditioning on whether state j is ever entered. This yields 


Sij = E [time in j\ start ini, ever transit to j]fij 

+£[time in 71start in i, never transit to y](l — /,-,■) 

= (Sij + s jj ) fij + &i,j (1 — fij ) 

= Si,j + fij S jj 

since s;j is the expected number of additional time periods spent in state j given that 
it is eventually entered from state ;. Solving the preceding equation yields 


Example 4.31 In Example 4.30, what is the probability that the gambler ever has a 
fortune of 1? 

Solution: Since 53,1 = 1.4206 and ,vi i = 1.6149, then 


/3,l = — = 0.8797 
51,1 


As a check, note that fa t \ is just the probability that a gambler starting with 3 reaches 
1 before 7. That is, it is the probability that the gambler’s fortune will go down 2 
before going up 4; which is the probability that a gambler starting with 2 will go 
broke before reaching 6. Therefore, 


1 - (0.6/0.4) 2 
1 - (0.6/0.4) 6 


0.8797 


which checks with our earlier answer. 
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Suppose we are interested in the expected time until the Markov chain enters some 
sets of states A , which need not be the set of recurrent states. We can reduce this back 
to the previous situation by making all states in A absorbing states. That is, reset the 
transition probabilities of states in A to satisfy 

Pi i = 1, i e A 

This transforms the states of A into recurrent states, and transforms any state outside 
of A from which an eventual transition into A is possible into a transient state. Thus, 
our previous approach can be used. 


4.7 Branching Processes 

In this section we consider a class of Markov chains, known as branching processes, 
which have a wide variety of applications in the biological, sociological, and engineer¬ 
ing sciences. 

Consider a population consisting of individuals able to produce offspring of the same 
kind. Suppose that each individual will, by the end of its lifetime, have produced j new 
offspring with probability PjJ ^ 0, independently of the numbers produced by other 
individuals. We suppose that Pj < 1 for all j Js 0. The number of individuals initially 
present, denoted by Xq, is called the size of the zeroth generation. All offspring of the 
zeroth generation constitute the first generation and their number is denoted by X \. In 
general, let X n denote the size of the nth generation. It follows that { X n , n — 0, 1,...} 
is a Markov chain having as its state space the set of nonnegative integers. 

Note that state 0 is a recurrent state, since clearly Poo = 1 • Also, if Po > 0, all other 
states are transient. This follows since P,o = Pq, which implies that starting with i 
individuals there is a positive probability of at least Pq that no later generation will ever 
consist of i individuals. Moreover, since any finite set of transient states {1,2, ..., n} 
will be visited only finitely often, this leads to the important conclusion that, if Pq > 0 , 
then the population will either die out or its size will converge to infinity. 

Let 

OO 

i l = Y, J p j 

7=0 

denote the mean number of offspring of a single individual, and let 

OO 

0:2 = J2 (j - & 2p j 

7=0 

be the variance of the number of offspring produced by a single individual. 

Let us suppose that Xo = 1, that is, initially there is a single individual present. We 
calculate E[X n \ and Var(X„) by first noting that we may write 

W-i 

X n = Zi 

i=i 
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where Z,- represents the number of offspring of the ith individual of the (n — l)st 
generation. By conditioning on X n -i, we obtain 


£[X„] = E[E[X n |X„_i]] 


= E E z i\X n -x 


1 = 1 


= E[X n - i/x] 
= fiE[X n - 1] 


where we have used the fact that E[Z,] = ji. Since E[Xq] = 1, the preceding yields 


E[X i] = /x, 

E[X 2 ] = ij,E[X\\ = /x 2 , 

E[X n ] = nE[X n - 1] = n n 

Similarly, Var(Z„) may be obtained by using the conditional variance formula 

Var(X„) = £[Var(X I1 |X„_ 1 )] +Var(£[X„|X„_ 1 ]) 

Now, given X n -\, X n is just the sum of X n -\ independent random variables each 
having the distribution { Pj , j ^ 0}. Hence, 

E[X n |Z„_i] = X n -m, Var(Z„|X n _i) = X„_i<t 2 

The conditional variance formula now yields 

Var(Z„) = E[X n -\o 2 ] + Var(X„_j/x) 

= a 2 /x"- 1 + /x 2 Var(X„_ 1 ) 

= ct - /x” * + /x 2 (er 2 /x" 2 + /x-Var(Z„_ 2 )) 

= ct 2 (/x" _ 1 + /x") + /x 4 Var(X„_ 2 ) 

= or 2 (/x n_1 + n n ) + /x 4 (cr 2 /x"~ 3 + /x 2 Var(X„_ 3 )) 

= cr 2 (/x" * + /x" + /x" +1 ) + /x 6 Var(X„_ 3 ) 

= cr 2 (/x" _1 + /x" + • • ■ + M 2 "- 2 ) + /x 2 "Var(X 0 ) 



Therefore, 



(4.19) 
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Let 7T() denote the probability that the population will eventually die out (under the 
assumption that Xq = 1). More formally, 

7 to = lim P{X n = 0|Xo = 1} 

n—>o o 

The problem of determining the value of ttq was first raised in connection with the 
extinction of family surnames by Gabon in 1889. 

We first note that tcq = 1 if /x < 1. This follows since 

OO 

H n = E[X n ]= J2jP{X n = j} 
j = 1 

OO 

= J) 

j=1 

= P{X n > 1} 

Since/r" -» Owhen/r < 1, it follows that P{X n ^ 1} 0, and hence P{X n — 0} —> 1. 

In fact, it can be shown that ttq = 1 even when /x = 1. When fi > 1, it turns out 
that jto < 1 , and an equation determining icq may be derived by conditioning on the 
number of offspring of the initial individual, as follows: 

ttq — P {population dies out} 

OO 

= P {population dies out|Xj = j}Pj 

j =0 

Now, given that X\ = j, the population will eventually die out if and only if each of the 
j families started by the members of the first generation eventually dies out. Since each 
family is assumed to act independently, and since the probability that any particular 
family dies out is just 7To, this yields 

P {population dies out| X \ = j ) = jr ( j 

and thus ttq satisfies 

OO 

Jt o = Yl P J (4 - 20 ^ 

7=0 

In fact when /x > 1, it can be shown that jtq is the smallest positive number satisfying 
Equation (4.20). 

Example 4.32 If Po = P\ — Pi — then determine no- 

Solution: Since n = | ^ 1, it follows that tto = 1. ■ 

Example 4.33 If / j q = P\ = Pi = then determine ttq. 
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Solution: ttq satisfies 

_ 1 , 1 _ , 1_2 
^0 — 4 + 4^0 + 2^0 


or 


2^-q — 37to +1 = 0 


The smallest positive solution of this quadratic equation is 7To = j. ■ 

Example 4.34 In Examples 4.32 and 4.33, what is the probability that the population 
will die out if it initially consists of n individuals? 

Solution: Since the population will die out if and only if the families of each of the 
members of the initial generation die out, the desired probability is 7 Tq . For Example 
4.32 this yields 7 Tq = 1, and for Example 4.33, jTq = (5)". ■ 


4.8 Time Reversible Markov Chains 

Consider a stationary ergodic Markov chain (that is, an ergodic Markov chain that has 
been in operation for a long time) having transition probabilities and stationary 
probabilities 1 r,-, and suppose that starting at some time we trace the sequence of states 
going backward in time. That is, starting at time n, consider the sequence of states 
X n , X n -\, X n -2, .... It turns out that this sequence of states is itself a Markov chain 
with transition probabilities <2// defined by 

Qij — P{X m — j\X m +\ = i } 

_ P\X m — j . X m j r \ — 1 } 

P{X m+ 1 = /} 

= P{X m =j}P{X m + 1 =i\x m =j] 

P{x m +1 = i} 

= Xj p ii 

7Ti 

To prove that the reversed process is indeed a Markov chain, we must verify that 

P{X m = j\X m -\.\ — i , X m +2 , ...} = P{X m = j\X m -\.\ = /} 

To see that this is so, suppose that the present time is m+l. Now, since Xq, X 1, X2, .. .is 
aMarkov chain, itfollows that the conditional distribution of the future X m +2, X m+ ^, ... 
given the present state X m +\ is independent of the past state X m . However, indepen¬ 
dence is a symmetric relationship (that is, if A is independent of B , then B is independent 
of A), and so this means that given X m+ \ , X m is independent of X m+ 2, X m+ j, .... But 
this is exactly what we had to verify. 
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Thus, the reversed process is also a Markov chain with transition probabilities given 
by 


If Qjj = P[j for all i, j, then the Markov chain is said to be time reversible. The 
condition for time reversibility, namely, £?// — P,j , can also be expressed as 


TCj Pi j = 7i j Pjj for all i. j 


(4.21) 


The condition in Equation (4.21) can be stated that, for all states i and j , the rate at 
which the process goes from i to j (namely, tt, I) j) is equal to the rate at which it 
goes from j to i (namely, it j Pjj). It is worth noting that this is an obvious necessary 
condition for time reversibility since a transition from i to j going backward in time 
is equivalent to a transition from j to i going forward in time; that is, if X m = i and 
X m -\ — j, then a transition from i to j is observed if we are looking backward, and 
one from j to i if we are looking forward in time. Thus, the rate at which the forward 
process makes a transition from j to i is always equal to the rate at which the reverse 
process makes a transition from i to j; if time reversible, this must equal the rate at 
which the forward process makes a transition from i to j. 

If we can find nonnegative numbers, summing to one, that satisfy Equation (4.21), 
then it follows that the Markov chain is time reversible and the numbers represent the 
limiting probabilities. This is so since if 



(4.22) 


then summing over i yields 


E Xi p ‘j = *;E p j> =x J’ E - 1 


and, because the limiting probabilities jr; are the unique solution of the preceding, it 
follows that Xi = 7ii for all i. 

Example 4.35 Consider a random walk with states 0, 1, ..., M and transition prob¬ 
abilities 


Pi,i+1 = oii = 1 - i = 1,..., M - 1, 


Po,i = ao = I — Pq.o, 

Pm.m = a m = 1 — Pm.m -l 

Without the need for any computations, it is possible to argue that this Markov chain, 
which can only make transitions from a state to one of its two nearest neighbors, is 
time reversible. This follows by noting that the number of transitions from i to / + 1 
must at all times be within 1 of the number from i + 1 to i. This is so because between 
any two transitions from i to i + I there must be one from ; + 1 to i (and conversely) 
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since the only way to reenter i from a higher state is via state i + 1. Hence, it follows 
that the rate of transitions from i to i + I equals the rate from i + 1 to i, and so the 
process is time reversible. 

We can easily obtain the limiting probabilities by equating for each state i = 
0,1 ,..., M — 1 the rate at which the process goes from ; to i + I with the rate at 
which it goes from i + 1 to i . This yields 

7toa 0 = 7Ti (1 - ai), 

JC\a\ = jz" 2(1 - a2), 


nm = - cq+i), i = 0,1,..., M - 1 

Solving in terms of tto yields 
mo 

7T1 — ~ JTq , 


JT 2 


I (f 
«1 


-TTl = 


a iao 


1 - a2 (1 - a?2)(l - at) 
and, in general, 

oti-l ■■■c ‘0 


TTo 


JTi = 


(1 - a;) • • • (1 - ari) 


no, i = 1,2, ..., M 


Since Y^,o = 1, we obtain 
M 


JCo 


a?- 1•■ - ao 


1 + V-- 

^(1 

7 = 1 J 


= 1 


or 


TCq 


M 




01 j- 1 ■ ■ - ao 


(1 - oij) ■ ■ ■ (1 - aq) 


and 


JCi 


o'; i ■■■ao 


(1 - a,) ■ ■ • (1 - ai) 
For instance, if a, = a, then 


7Tq, i = 1,..., M 


JCo 


M 


-I -1 


> + E r 

7=! V 
1-/6 
1 - P M + l 


(4.23) 


(4.24) 
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and, in general. 


it/ 


FQ-P) 

1 _ pM +1 ’ 


i = 0, 1,..., M 


where 


P = 


a 


1 — a 


Another special case of Example 4.35 is the following urn model, proposed by the 
physicists P. and T. Ehrenfest to describe the movements of molecules. Suppose that M 
molecules are distributed among two urns; and at each time point one of the molecules 
is chosen at random, removed from its urn, and placed in the other one. The number of 
molecules in urn / is a special case of the Markov chain of Example 4.35 having 

M - i 

aj = -, i = 0, 1. M 

M 

Hence, using Equations (4.23) and (4.24) the limiting probabilities in this case are 


JtQ = 


M 




7=1 


(M — j + 1) •• • (M — 1 )M 
./(./ “ 1) • • • I 


M 

E 

7=0 


-1 -1 


M 


M 


where we have used the identity 



Hence, from Equation (4.24) 


m = 



i = 0,1 ,...,M 


Because the preceding are just the binomial probabilities, it follows that in the long 
run, the positions of each of the M balls are independent and each one is equally likely 
to be in either urn. This, however, is quite intuitive, for if we focus on any one ball, it 
becomes quite clear that its position will be independent of the positions of the other 
balls (since no matter where the other M — 1 balls are, the ball under consideration at 
each stage will be moved with probability 1 /M) and by symmetry, it is equally likely 
to be in either urn. 
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Figure 4.1 A connected graph with arc weights. 


Example 4.36 Consider an arbitrary connected graph (see Section 3.6 for definitions) 
having a number Wjj associated with arc (;, j) for each arc. One instance of such a 
graph is given by Figure 4.1 . Now consider a particle moving from node to node in this 
manner: If at any time the particle resides at node i , then it will next move to node j 
with probability P ,, where 


Pij = 


Wjj 

w ij 


and where Wjj is 0 if (i, j ) is not an arc. For instance, for the graph of Figure 4.1, 
P 12 = 3/(3+1+2) = i. 

The time reversibility equations 


11 i Pij ~ XjPji 
reduce to 

_ W P _ W .H 

/J; Wij IJ/ Wji 

or, equivalently, since uijj = Wji 

TTi TZ j 

Jlj <nj Ei w Ji 
which is equivalent to 

TXi 

v— 1 

Lj w ij 

or 



j 


or, since 1 = E/ 71 i 


7Ti = 


Ej w u 

E,- wij 
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Because the 7r,- s given by this equation satisfy the time reversibility equations, it follows 

that the process is time reversible with these limiting probabilities. 

For the graph of Figure 4.1 we have that 

_ _ 6 _ _ 3 _ _ 6 _ _ 5 _ _ 12 H 

7Ti — 22’ ^2 — 32’ ^3 — 32’ ^4 — 32’ ^5 — 32 ® 

If we try to solve Equation (4.22) for an arbitrary Markov chain with states 0, 1,... ,M, 
it will usually turn out that no solution exists. For example, from Equation (4.22), 

XjPij = XjPji, 

XkPkj — x jPjk 
implying (if PijPjk > 0) that 
Xi Pji Pkj 

Xk Pij Pjk 

which in general need not equal /\, / P,k . Thus, we see that a necessary condition for 
time reversibility is that 

PikPkjPji = PijPjkPki for all i, j, k (4.25) 

which is equivalent to the statement that, starting in state i , the path i —> k —> j —> i 
has the same probability as the reversed path i -* j k —> i. To understand the 
necessity of this, note that time reversibility implies that the rate at which a sequence 
of transitions from i to k to j to i occurs must equal the rate of ones from i to j to k to 
i (why?), and so we must have 

Xi Pjk Pkj Pji — m PijPjkPki 

implying Equation (4.25) when jr; > 0. 

In fact, we can show the following. 

Theorem 4.2 An ergodic Markov chain for which Pij — 0 whenever Pji — 0 is time 
reversible if and only if starting in state ;, any path back to i has the same probability 
as the reversed path. That is, if 

P‘, it Pn.il ■ ■ ■ P‘k,i = Pi.ik Pik,‘k-t ' ' ' P‘\ ,i ( 4 . 26 ) 

for all states i, i\, ..., i^. 

Proof. We have already proven necessity. To prove sufficiency, fix states i and j and 
rewrite (4.26) as 

Pi.il Pil.il ' ' ' Pik.jPji — PijPj.ik ' ' ' P‘\ J 

Summing the preceding over all states i\,.... ik yields 

p^+i p __ p p^+i 

r ij r J‘ ~ r, J r ji 

Letting k —> oo yields 
XjPji = PijXi 

which proves the theorem. ■ 
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Example 4.37 Suppose we are given a set of n elements, numbered 1 through n, 
which are to be arranged in some ordered list. At each unit of time a request is made 
to retrieve one of these elements, element i being requested (independently of the 
past) with probability P,. After being requested, the element then is put back but not 
necessarily in the same position. In fact, let us suppose that the element requested is 
moved one closer to the front of the list; for instance, if the present list ordering is 1, 
3, 4, 2, 5 and element 2 is requested, then the new ordering becomes 1, 3, 2, 4, 5. We 
are interested in the long-run average position of the element requested. 

For any given probability vector P = (P\, ..., P n ), the preceding can be modeled 
as a Markov chain with n ! states, with the state at any time being the list order at that 
time. We shall show that this Markov chain is time reversible and then use this to show 
that the average position of the element requested when this one-closer rule is in effect 
is less than when the rule of always moving the requested element to the front of the 
line is used. The time reversibility of the resulting Markov chain when the one-closer 
reordering rule is in effect easily follows from Theorem 4.2. For instance, suppose 
n = 3 and consider the following path from state (1, 2, 3) to itself: 

(1, 2, 3) -* (2, 1, 3) (2, 3, 1) -* (3, 2, 1) 

(3, 1,2) (1,3,2) -* (1,2,3) 

The product of the transition probabilities in the forward direction is 
P 2 Pi Pi Pi Pi Pi = Pi Pi P^ 
whereas in the reverse direction, it is 

Pi Pi Pi Pi Pi Pi = Pi Pi pi 

Because the general result follows in much the same manner, the Markov chain is 
indeed time reversible. (For a formal argument note that if /, denotes the number of 
times element i moves forward in the path, then as the path goes from a fixed state back 
to itself, it follows that element i will also move backward /, times. Therefore, since 
the backward moves of element i are precisely the times that it moves forward in the 
reverse path, it follows that the product of the transition probabilities for both the path 
and its reversal will equal 

r [ p i ft+n 

i 

where r, is equal to the number of times that element i is in the first position and the 
path (or the reverse path) does not change states.) 

For any permutation i\, ii, ..., i n of 1,2,..., n, let n(i\, h, ..., i n ) denote the 
limiting probability under the one-closer rule. By time reversibility we have 

■Pq +1 7r(i'i, ij+i, ..., i„ ) = P;p t(i'i, • • •, ij+l, i j, • • ■, in) (4.27) 


for all permutations. 
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Now the average position of the element requested can be expressed (as in Section 
3.6.1) as 


Average position = P/P [Position of element /] 


= y^ Pi 1 + y^ P{ element j precedes element i 
i |_ 

= > + ££ Pj P {ej precedes e/j 
> ifr 

— 1 + y ^[PjP{ej precedes e,} + P/P{e, precedes e,}] 

i<j 

= i + D P,P{ej precedes e,} + Pj( 1 — P{e/ precedes e,})] 

‘<j 

= 1 + £ £ (f 3 , — Pj)P{ej precedes e, } + ££^ 


'<7 


'<7 


Hence, to minimize the average position of the element requested, we would want to 
make P{ej precedes e, } as large as possible when Pj > Pj and as small as possible 
when Pj > Pj. Under the front-of-the-line rule we showed in Section 3.6.1, 


P{ej precedes e, } = 


Pj 

Pj + Pi 


(since under the front-of-the-line rule element j will precede element i if and only if 
the last request for either i or j was for j). 

Therefore, to show that the one-closer rule is better than the front-of-the-line rule, 
it suffices to show that under the one-closer rule 


Pie; precedes e,} > - ; — 

JV Pi + Pi 


when Pj > Pj 


Now consider any state where element i precedes element j, say, (..., i, i \,..., 4, 
j ,...). By successive transpositions using Equation (4.27), we have 

/ P, \ k+ ' 

i- 7 r(...,7,4 - ,i k ,i,...) (4.28) 

For instance, 


7T(1, 2, 3) 


Pi 

— 7T(1, 3, 
p 3 


Pi Pi 

2 ) = —— 
P 3 P 3 


Pi Pi Pi ^ 0 . 

-7r(3, 2, 1) = 

P 3 Pi Pi 


7T( 3, 1,2) 



7 r(3, 2, 1) 


2 
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Now when Pj > Pi, Equation (4.28) implies that 

Pi 

< — :r(..., j, i\ -- ik, *,-■■) 

Pj 

Letting a(i, j) = P { e/ precedes ej), we see by summing over all states for which i 
precedes j and by using the preceding that 

Pi 

“0. j) < ~jp a U, 0 

Pj 

which, since a(i, j) = 1 — a(J, i ), yields 


a O', i ) > 


Pj + Pi 


Hence, the average position of the element requested is indeed smaller under the one- 
closer rule than under the front-of-the-line rule. ■ 


The concept of the reversed chain is useful even when the process is not time 
reversible. To illustrate this, we start with the following proposition whose proof is left 
as an exercise. 

Proposition 4.9 Consider an irreducible Markov chain with transition probabilities 
Pij- If we can find positive numbers jti ,i ^ 0, summing to one, and a transition 
probability matrix Q = [Qij\ such that 

•T; Pij - -t / Q ji (4.29) 

then the Qjj are the transition probabilities of the reversed chain and the i r,- are the 
stationary probabilities both for the original and reversed chain. 

The importance of the preceding proposition is that, by thinking backward, we can 
sometimes guess at the nature of the reversed chain and then use the set of Equations 
(4.29) to obtain both the stationary probabilities and the Qjj. 

Example 4.38 A single bulb is necessary to light a given room. When the bulb in use 
fails, it is replaced by a new one at the beginning of the next day. Let X n equal i if the 
bulb in use at the beginning of day n is in its ith day of use (that is, if its present age is 
i). For instance, if a bulb fails on day n — 1, then a new bulb will be put in use at the 
beginning of day n and so X n = 1. If we suppose that each bulb, independently, fails 
on its ith day of use with probability /?,, i ^ 1, then it is easy to see that {X n , n ^ 1} 
is a Markov chain whose transition probabilities are as follows: 


Pi i — P{bulb, on its ith day of use, fails} 

= Tj life of bulb = i' |life of bulb ^ i} 

= P{L = 1 } 

P{L > i } 

where L, a random variable representing the lifetime of a bulb, is such that P{L = 
i} = pj. Also, 


Pi,i +1 = 1 “ Pi, 1 
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Suppose now that this chain has been in operation for a long (in theory, an infinite) 
time and consider the sequence of states going backward in time. Since, in the forward 
direction, the state is always increasing by 1 until it reaches the age at which the item 
fails, it is easy to see that the reverse chain will always decrease by 1 until it reaches 1 
and then it will jump to a random value representing the lifetime of the (in real time) 
previous bulb. Thus, it seems that the reverse chain should have transition probabilities 
given by 

2m- l = 1, i > 1 

2t,i = Pi, i > 1 

To check this, and at the same time determine the stationary probabilities, we must see 
if we can find, with the 2/, j as previously given, positive numbers {jti } such that 

7tiPi,j — 7T j Q j,i 

To begin, let j = 1 and consider the resulting equations: 

-T Pi. I — TtlQl,i 


This is equivalent to 


P{L = i} 

jti - 

P{L > i) 


= tt\P{L — ;'} 


or 


m = it\ P{L ^ (} 


Summing over all i yields 

(30 OO 

1 = ^ Jti = n\ ^ P{L ^ (} = mE[L] 
; = 1 1=1 


and so, for the preceding 2// t0 represent the reverse transition probabilities, it is 
necessary for the stationary probabilities to be 


jti 


P{L > i} 
E[L] 


i > 1 


To finish the proof that the reverse transition probabilities and stationary probabilities 
are as given, all that remains is to show that they satisfy 


JtiPi,i +1 — ^i+1 Qi+l,i 

which is equivalent to 

P{L > (} / P{L = i} \ 

E[L] V P{L>i}) 


P{L ^i + 1} 
E[L} 


and which is true since P{L ^ (} — P{L = i] — P{L ^ i + 1}. 
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4.9 Markov Chain Monte Carlo Methods 

Let X be a discrete random vector whose set of possible values is xy, j p 1. Let the 
probability mass function of X be given by P{X = x /1 > j > 1, and suppose that we 
are interested in calculating 

(X) 

6 = E[h(X)] = J>(x 7 )P{X = xj } 

7=1 

for some specified function h. In situations where it is computationally difficult to 
evaluate the function h(xj), j Js 1, we often turn to simulation to approximate 0. 
The usual approach, called Monte Carlo simulation , is to use random numbers to 
generate a partial sequence of independent and identically distributed random vectors 
Xi, X 2 , ..., X„ having the mass function P{X = x:\, j p 1 (see Chapter 11 for a 
discussion as to how this can be accomplished). Since the strong law of large numbers 
yields 


lim ^ = 0 (4.30) 

n—►00*—' n 

i=l 

it follows that we can estimate 9 by letting n be large and using the average of the 
values of h (X, ), i = 1..... n as the estimator. 

It often, however, turns out that it is difficult to generate a random vector having the 
specified probability mass function, particularly if X is a vector of dependent random 
variables. In addition, its probability mass function is sometimes given in the form 
P{X = xj} = Cbj, j > 1, where the bj are specified, but C must be computed, and 
in many applications it is not computationally feasible to sum the bj so as to determine 
C. Fortunately, however, there is another way of using simulation to estimate 0 in these 
situations. It works by generating a sequence, not of independent random vectors, but 
of the successive states of a vector-valued Markov chain Xi, X 2 ,... whose stationary 
probabilities are P{X — Xj}J > 1. If this can be accomplished, then it would follow 
from Proposition 4.7 that Equation (4.30) remains valid, implying that we can then use 
Y^i -1 h(Xj)/n as an estimator of 9. 

We now show how to generate a Markov chain with arbitrary stationary probabilities 
that may only be specified up to a multiplicative constant. Let b(j), j — 1, 2,... be 
positive numbers whose sum B = Y2°p=i b(J) i s finite- The following, known as the 
Hastings-Metropolis algorithm , can be used to generate a time reversible Markov chain 
whose stationary probabilities are 

7 r(j) = b(j)/B, 7 = 1,2,... 

To begin, let Q be any specified irreducible Markov transition probability matrix on 
the integers, with q(i, j) representing the row i column j element of Q. Now define 
a Markov chain [X n , n ^ 0} as follows. When X n = i, generate a random variable 
Y such that P{Y = j } = q(i, j), j = 1, 2, .... If Y = j, then set X n+ \ equal to j 
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with probability a(i, j), and set it equal to i with probability 1 — a(i, j ). Under these 
conditions, it is easy to see that the sequence of states constitutes a Markov chain with 
transition probabilities P, j given by 

Pij = q(i, j)oc(i, j), if j # i 

Pij — q(i, i) + y^ q(i, k)(l - a(i, k )) 
k^i 

This Markov chain will be time reversible and have stationary probabilities n(j) if 
n(i)Pij = 7t(J)Pj,i for j / i 
which is equivalent to 

n{i)q{i, j)oc(L j) = n(j)q(j, i)a(j , i) (4.31) 


But if we take jtj = b(j)/B and set 


a(i , j) = min 


/ n(j)q(j , i) 
\it(i)q(i, j ) 


then Equation (4.31) is easily seen to be satisfied. For if 


(4.32) 


a O', j) 
then a(j, i) 
a(j, i) 


x(j)q(j, i) 
x(i)q(i, j ) 

1 and Equation (4.31) follows, and if a(i,j ) = 1 then 

x(i)q(i , j) 
n(j)q(j, i) 


and again Equation (4.31) holds, thus showing that the Markov chain is time reversible 
with stationary probabilities Also, since 7i(j) = b(j)/B, we see from (4.32) that 


a(i, j) = min 


/ b(j)g(j, i) 

\b(i)q(i , j) 


which shows that the value of B is not needed to define the Markov chain, because the 
values b(j) suffice. Also, it is almost always the case that tt(/), j ^ 1 will not only 
be stationary probabilities but will also be limiting probabilities. (Indeed, a sufficient 
condition is that /( ; > 0 for some i.) 

Example 4.39 Suppose that we want to generate a uniformly distributed element 
in - : /\ the set of all permutations (jti,..., x„) of the numbers (1,...,«) for which 
Yl"j=]j x j > a f° r a given constant a. To utilize the Hastings-Metropolis algorithm 
we need to define an irreducible Markov transition probability matrix on the state 
space ■'/. To accomplish this, we first define a concept of “neighboring” elements of 
■9 J , and then construct a graph whose vertex set is .9. We start by putting an arc between 
each pair of neighboring elements in .9, where any two permutations in -9' are said 
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to be neighbors if one results from an interchange of two of the positions of the other. 
That is, (1, 2, 3, 4) and (1, 2, 4, 3) are neighbors whereas (1, 2, 3, 4) and (1, 3, 4, 2) are 
not. Now, define the q transition probability function as follows. With N (,v) defined as 
the set of neighbors of s, and /V(s)| equal to the number of elements in the set N (s), 
let 


q(s, t) 


1 

\N (s)| 


if t e N( s) 


That is, the candidate next state from s is equally likely to be any of its neighbors. Since 
the desired limiting probabilities of the Markov chain are 7r(s) = C, it follows that 
it (s) = 7t (t), and so 


a(s, t) = min(|2V(s)|/12V (t)|, 1) 

That is, if the present state of the Markov chain is s then one of its neighbors is randomly 
chosen, say, t. If t is a state with fewer neighbors than s (in graph theory language, if the 
degree of vertex t is less than that of vertex s), then the next state is t. If not, a uniform 
(0,1) random number U is generated and the next state is t if U < |(iV(s)|/|iV(t)| and is 
s otherwise. The limiting probabilities of this Markov chain are 7r(s) = where 

\5P\ is the (unknown) number of permutations in .9'. ■ 

The most widely used version of the Hastings-Metropolis algorithm is the Gibbs 
sampler. Let X = (X\,..., X n ) be a discrete random vector with probability mass 
function p(x) that is only specified up to a multiplicative constant, and suppose that 
we want to generate a random vector whose distribution is that of X. That is, we want 
to generate a random vector having mass function 

p(x) = Cg(x ) 

where g(x) is known, but C is not. Utilization of the Gibbs sampler assumes that for 
any i and values xj , j ^ , we can generate a random variable X having the probability 
mass function 


P{X = x) = P{Xi = x\Xj = xj, j # i} 


It operates by using the Hasting-Metropolis algorithm on a Markov chain with states 
x = (x i, ..., x n ), and with transition probabilities defined as follows. Whenever the 
present state is x, a coordinate that is equally likely to be any of 1,...,« is chosen. 
If coordinate i is chosen, then a random variable X with probability mass function 
P{X — x] = P{Xj = x\Xj = Xj, j ^ /} is generated. If X = x, then the state 

y = (xi,.. .Xi- 1 , x, Xi +1 . Xfi) is considered as the candidate next state. In other 

words, with x and y as given, the Gibbs sampler uses the Hastings-Metropolis algorithm 
with 


<?(x, y) 


X -P{Xi=x\Xj=Xj,i ^i} = 


p( y) 

nP{Xj = Xj, j / ;'} 
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Because we want the limiting mass function to be p, we see from Equation (4.32) that 
the vector y is then accepted as the new state with probability 


a(x, y) = min 


= min 


= 1 


( pi y)g(y, x) A 

\p(x)q(x, y)’ ) 
( p(y)p(x) \ 
\p(x)p( y)’ 7 


Hence, when utilizing the Gibbs sampler, the candidate state is always accepted as the 
next state of the chain. 

Example 4.40 Suppose that we want to generate n uniformly distributed points in 
the circle of radius 1 centered at the origin, conditional on the event that no two points 
are within a distance d of each other, when the probability of this conditioning event is 
small. This can be accomplished by using the Gibbs sampler as follows. Start with any 
n points xj, ..., x n in the circle that have the property that no two of them are within d 
of the other; then generate the value of I, equally likely to be any of the values 1 , ..., n. 
Then continually generate a random point in the circle until you obtain one that is not 
within d of any of the other n — 1 points excluding x/. At this point, replace x/ by the 
generated point and then repeat the operation. After a large number of iterations of this 
algorithm, the set of n points will approximately have the desired distribution. ■ 

Example 4.41 Let Xj, i = 1,...,«, be independent exponential random variables 
with respective rates Xj, i = 1,...,«. Let S = Yl=i ^i, and suppose that we want 
to generate the random vector X = (X\..... X n ), conditional on the event that S > c 
for some large positive constant c. That is, we want to generate the value of a random 
vector whose density function is 


fix t, ■ • -,x„) 


1 

P{S > c] 


/=! 


n 

Xj ^ 0, Xj > c 

i=i 


This is easily accomplished by starting with an initial vector x = (x \,..., x n ) satisfying 
Xj >0,/ = \..... n. Yfi= i x i > c - Then generate a random variable I that is equally 
likely to be any of 1Next, generate an exponential random variable X with 
rate A/ conditional on the event that X + xj > c. This latter step, which calls 

for generating the value of an exponential random variable given that it exceeds c — 
j+i Xj, is easily accomplished by using the fact that an exponential conditioned to 
be greater than a positive constant is distributed as the constant plus the exponential. 
Consequently, to obtain X, first generate an exponential random variable Y with rate 
Xj, and then set 


X = Y + 




+ 


where a + — max (a, 0). 
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The value of xj should then be reset as X and a new iteration of the algorithm 
begun. ■ 

Remark As can be seen by Examples 4.40 and 4.41 , although the theory for the Gibbs 
sampler was represented under the assumption that the distribution to be generated was 
discrete, it also holds when this distribution is continuous. 


4.10 Markov Decision Processes 

Consider a process that is observed at discrete time points to be in any one of M possible 
states, which we number by 1, 2, ..., M. After observing the state of the process, an 
action must be chosen, and we let A, assumed finite, denote the set of all possible 
actions. 

If the process is in state i at time n and action a is chosen, then the next state of 
the system is determined according to the transition probabilities Pijia). If we let X n 
denote the state of the process at time n and a n the action chosen at time n, then the 
preceding is equivalent to stating that 

P{X, ,+t = j\X 0 , a 0 ,Xi,ai,..., X n = i, a n = a] — Pjj(a) 

Thus, the transition probabilities are functions only of the present state and the subse¬ 
quent action. 

By a policy, we mean a rule for choosing actions. We shall restrict ourselves to 
policies that are of the form that the action they prescribe at any time depends only 
on the state of the process at that time (and not on any information concerning prior 
states and actions). However, we shall allow the policy to be “randomized” in that 
its instructions may be to choose actions according to a probability distribution. In 
other words, a policy P is a set of numbers ft = {/l, (a), a e A, i = 1,..., M } with 
the interpretation that if the process is in state i, then action a is to be chosen with 
probability /3; (a). Of course, we need have 

0 ^ pi (a) P 1, for all i, a 

Y, Pi (a) = 1, for all i 

a 

Under any given policy p, the sequence of states { X n , n = 0,1,...} constitutes a 
Markov chain with transition probabilities Pjj(P) given by 

Pij(P) = Pp{X n+ 1 = j\X n = ,'}* 

= £ Pijia)Pi (a) 

a 

where the last equality follows by conditioning on the action chosen when in 
state i. Let us suppose that for every choice of a policy P, the resultant Markov chain 
{X n , n = 0, 1,...} is ergodic. 


We use the notation Pp to signify that the probability is conditional on the fact that policy fi is used. 
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For any policy / i , let 717 fl denote the limiting (or steady-state) probability that the 
process will be in state i and action a will be chosen if policy ji is employed. That is, 

Ttia = lim Pp{X n = i, a n = a } 
n—>oo 

The vector n = ( m a ) must satisfy 

(i) 7ii a ^ 0 for all i, a, 

(ii) ZiEa*ia = 1. (4-33) 

(iii) 22a 71 = Li L« JT.fl p ij GO for all j 

Equations (i) and (ii) are obvious, and Equation (iii), which is an analogue of 
Theorem (4.1), follows as the left-hand side equals the steady-state probability of being 
in state j and the right-hand side is the same probability computed by conditioning on 
the state and action chosen one stage earlier. 

Thus for any policy p. there is a vector it — (it ia ) that satisfies (i)-(iii) and with 
the interpretation that n ,- a is equal to the steady-state probability of being in state i and 
choosing action a when policy /J is employed. Moreover, it turns out that the reverse is 
also true. Namely, for any vector it = that satisfies (i)-(iii), there exists a policy 
P such that if P is used, then the steady-state probability of being in i and choosing 
action a equals 7r; a . To verify this last statement, suppose that it = (iti a ) is a vector 
that satisfies (i)-(iii). Then, let the policy P = (Pi (a)) be 

Pi(a ) = P{P chooses a|state is i} 

77 i a 

La 71 ia 

Now let P ia denote the limiting probability of being in i and choosing a when policy 
P is employed. We need to show that P U1 = iti a ■ To do so, first note that {P; a , i = 
1 M, a e A} are the limiting probabilities of the two-dimensional Markov chain 
{ (X n ,a„),n 7> 0}. Hence, by the fundamental Theorem 4.1, they are the unique solution 
of 


GO Pia > 0 , 

GiO EiEaPia = 1. 

(iii') Pja P ia' Pij 0 *')Pj GO 

where (iii') follows since 


P{Z„+i = j, a n +1 = a \X n = i, a n = a'} = Pij(a’)Pj(a) 


Because 


Pj(a ) 


H ja 

La 71 ja 
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we see that ( Pj a ) is the unique solution of 


Pia > 0, 

EE'"' 1 ’ 

i a 

Pja = Pia' Pij ( a ) 

i a' 


n ja 

Jla n ja 


Hence, to show that Pj a = jrj a , we need show that 


Hia ^ 


EE^ = i’ 

i ci 


ft ja — 


^ n ia' Pij ( a ) 
i a' 


Kja 

n ja 


The top two equations follow from (i) and (ii) of Equation (4.33), and the third, which 
is equivalent to 

'y ' JT j a = y ' y ' -T a 1 Pi j (fl ) 

a i a' 


follows from condition (iii) of Equation (4.33). 

Thus we have shown that a vector fi = (tx ui ) will satisfy (i), (ii), and (iii) of Equation 
(4.33) if and only if there exists a policy fi such that n la is equal to the steady-state 
probability of being in state i and choosing action a when fi is used. In fact, the policy 
P is defined by Pi (a) = n ia / n ia . 

The preceding is quite important in the determination of “optimal” policies. For 
instance, suppose that a reward R(i, a) is earned whenever action a is chosen in state i. 
Since R(X ,, a,) would then represent the reward earned at time i, the expected average 
reward per unit time under policy P can be expressed as 


expected average reward under P 


lim Ea 

n-+c o H 


~E"=1 RjXi.ai)- 

n 


Now, if 7ti a denotes the steady-state probability of being in state i and choosing action 
a, it follows that the limiting expected reward at time n equals 


lim E[R(X n ,a n 

n —> oo 


)] = a) 


i a 


which implies that 

expected average reward under p = jti a R(i, a ) 

i ci 
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Hence, the problem of determining the policy that maximizes the expected average 
reward is 

maximize > > 7tj a R(i,a ) 

X = (Kia) . 

i a 

subject to 7Ti a js 0, for all i, a, 

U 

i a 

H n ia = X, m m a Pij(a ), for all j (4.34) 

a i a 


However, the preceding maximization problem is a special case of what is known as a 
linear program and can be solved by a standard linear programming algorithm known 
as the simplex algorithm ,*If fi* = (tr* a ) maximizes the preceding, then the optimal 
policy will be given by /?* where 


Pf (a) = 


7t; 


, 7TZ. 


Remarks 

(i) It can be shown that there is a tt* maximizing Equation (4.34) that has the property 
that for each i, nf is zero for all but one value of a, which implies that the opti¬ 
mal policy is nonrandomized. That is, the action it prescribes when in state i is a 
deterministic function of i. 

(ii) The linear programming formulation also often works when there are restrictions 
placed on the class of allowable policies. For instance, suppose there is a restriction 
on the fraction of time the process spends in some state, say, state 1. Specifically, 
suppose that we are allowed to consider only policies having the property that 
their use results in the process being in state 1 less than 100a percent of time. 
To determine the optimal policy subject to this requirement, we add to the linear 
programming problem the additional constraint 

JTla < a 
a 

since 71 la represents the proportion of time that the process is in state 1. 


4.11 Hidden Markov Chains 

Let { X n , n = 1, 2,...} be a Markov chain with transition probabilities Pjj and initial 
state probabilities pi = P{X\ = i], i ^ 0. Suppose that there is a finite set 5? 

* It is called a linear program since the objective function a ) n ia an d the constraints are all 

linear functions of the ni a . For a heuristic analysis of the simplex algorithm, see 4.5.2. 
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of signals, and that a signal from 5? is emitted each time the Markov chain enters a 
state. Further, suppose that when the Markov chain enters state j then, independently 
of previous Markov chain states and signals, the signal emitted is s with probability 
p(s\j), E^e.y-' P( s \j ) = 1- That is, if S„ represents the nth signal emitted, then 

P{S\ = s|Xi = j) = p(s\j), 

P{S„ = s|Xi, Si,, X H -i, S„_i, X n = j} = p(s\j ) 

A model of the preceding type in which the sequence of signals Si, S 2 ,... is observed, 
while the sequence of underlying Markov chain states X\, X2,... is unobserved, is 
called a hidden Markov chain model. 

Example 4.42 Consider a production process that in each period is either in a good 
state (state 1) or in a poor state (state 2). If the process is in state 1 during a period 
then, independent of the past, with probability 0.9 it will be in state 1 during the next 
period and with probability 0.1 it will be in state 2. Once in state 2, it remains in that 
state forever. Suppose that a single item is produced each period and that each item 
produced when the process is in state 1 is of acceptable quality with probability 0.99, 
while each item produced when the process is in state 2 is of acceptable quality with 
probability 0.96. 

If the status, either acceptable or unacceptable, of each successive item is observed, 
while the process states are unobservable, then the preceding is a hidden Markov chain 
model. The signal is the status of the item produced, and has value either a or u, 
depending on whether the item is acceptable or unacceptable. The signal probabilities 
are 


p(u\l) = 0.01, /?(a|l) = 0.99, 
p(u\2) = 0.04, p(a |2) = 0.96 

while the transition probabilities of the underlying Markov chain are 

Pl,l = 0.9 = 1 - Pi, 2 , P 2 , 2=1 ■ 

Although {S n , n 1} is not a Markov chain, it should be noted that, conditional on 
the current state X n , the sequence S„, X II+ ]. S n + 1 ,... of future signals and states is 
independent of the sequence Xi, Si ,..., X n -i, S„-\ of past states and signals. 

Let S" = {Si,, S n ) be the random vector of the first n signals. For a fixed 
sequence of signals si,..., s n , let s* = («i, ..., s*), k ^ n. To begin, let us determine 
the conditional probability of the Markov chain state at time n given that S" = s„. 
To obtain this probability, let 

FnU) = P{ S" = s„, X n = j} 


and note that 

P{X n = j\S n = s„} = 


P{S" =s„,X„ = j] 
P{ S" =s„} 


Fn(j) 

E, W 
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Now, 


F n (j) = Pi S" _1 = S n -l,S n = S n , X n = j) 

= J2 P{ S" -1 = s„_i, X n -i = i, X n = j, S n = s„] 

i 

= Fn-\(i)P{X n = j , S n = s n IS"- 1 = SX B _1 = i] 

i 

= F n -\(i)P{X n = j , S n = s n \X n -i = 1 } 

i 

= F n -\(i)Pi,jP(^ n \j) 

i 

= p(s n \j ) Y] F n -\(i)Pi.j (4-35) 

i 

where the preceding used that 

P{X n = j, S n = s n \X„-\ = i} 

= P{X n = j\X„-i = i] x P{S„ = s n \X n = j, X n -\ = i] 

= PijP{S n = Sn\X n = j} 

= Pijp(s n \j) 

Starting with 

F\(i) = P{X i = i, Si = si} = pip(si\i) 

we can use Equation (4.35) to recursively determine the functions F 2 (i), F^ii), ..up 
to F n (i). 

Example 4.43 Suppose in Example 4.42 that P{X\ = 1} = 0.8. It is given that the 
successive conditions of the first three items produced are a, u, a. 

(a) What is the probability that the process was in its good state when the third item 
was produced? 

(b) What is the probability that X 4 is 1 ? 

(c) What is the probability that the next item produced is acceptable? 

Solution: With S 3 = (a, u, a), we have 

Fi(l) = (0.8) (0.99) = 0.792, 

F\(2) = (0.2X0.96) = 0.192 

F 2 ( 1) = 0.01[0.792(0.9) + 0.192(0)] = 0.007128, 

F 2 ( 2) = 0.04[0.792(0.1) + 0.192(1)] = 0.010848 

F 3 (l) = 0.99[(0.007128)(0.9)] » 0.006351, 

F 3 ( 2) = 0.96[(0.007128)(0.1) + 0.010848] » 0.011098 
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Therefore, the answer to part (a) is 


P{X 3 = l|s 3 } 


0.006351 

0.006351 +0.011098 


0.364 


To compute P{X 4 = 11S 3 }, condition on X 3 to obtain 


P{X 4 = l|s 3 } = P{X 4 = l\X 3 = 1, s 3 }P{X 3 = l|s 3 } 
+P{X 4 = l\X 3 = 2,s 3 }P{X 3 = 2|s 3 } 
= P{X 4 = l\X 3 = 1, s 3 }(0.364) 

+P{X 4 = l\X 3 = 2, s 3 }(0.636) 

= 0.364Pi, 1 +0.636P 2 ,i 
= 0.3276 


To compute P {64 = a|s 3 J, condition on X 4 to obtain 


P{5 4 = fl|s 3 } = P{5 4 = a\X 4 = 1, s 3 }P{X 4 = l|s 3 } 

+ P{5 4 = a\X 4 = 2, s 3 }P{X 4 = 2|s 3 } 

= P{S 4 = a\X 4 = 1K0.3276) 

+ P{S 4 = a\X 4 = 2}(1 - 0.3276) 

= (0.99) (0.3276) + (0.96) (0.6724) = 0.9698 ■ 

To compute P{S" = s„ ), use the identity P{S" = s„} = F n (i) along with 
Equation (4.35). If there are N states of the Markov chain, this requires computing n N 
quantities F n (z ), with each computation requiring a summation over N terms. This can 
be compared with a computation of P{S" = s„} based on conditioning on the first n 
states of the Markov chain to obtain 

P{S" = s„} = Y, P f S " = s «l X i =h,...,X n = i^PiXr =i u ...,X n = i n ] 

= Y P(- V l l ( 'l ) • • • PO«l + )Pi'| E,1 j 2 P h,h ■ ■ ■ P >n-\.in 
h,-,in 


The use of the preceding identity to compute P{S" = s„} would thus require a sum¬ 
mation over N" terms, with each term being a product of 2 n values, indicating that it 
is not competitive with the previous approach. 

The computation of P{S" = s„ | by recursively determining the functions p+z ) is 
known as the forward approach. There also is a backward approach, which is based 
on the quantities Bk(i), defined by 


Bk(i) — E{S*:+i = Sk+ 1 , S n = s„\Xk — z) 
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A recursive formula for Bk (i ) can be obtained by conditioning on X^+i. 

Bk(i) = Y, P{Sk +1 = $k+i,S n = s n \Xk = i, Xk +1 = j}P{Xk +1 = ylX* = /} 

j 

= ^ ' ^{^+1 = ^+1’ • • • 5 Sn = Sn\Xk+l = j}Pi,j 
j 

= Y, P{Sk+i = Sk+ i\Xk+i = j] 

j 

*-P{Sk+2 = $k+2i • • • » Sfi = ^«l^+l = ^+1’ ^k+l — j}Pi,j 

= ^ ^ p(.Sk+l\j)P{$k+2 = $k+2i • • • ? = Sn\Xk-\-\ = j } Pi,j 

j 

= J2p( s k+i\j)B k +i(j)Pi,j (4.36) 

j 

Starting with 

B n ~\(i) = P { S n = s n \X n -i = i] 

= X, p i.jP (s n\i) 

j 

we would then use Equation (4.36) to determine the function B n - 2 (i), then B n - 2 ,{i), 
and so on, down to B\ (i). This would then yield P{S" = s„} via 

P{S" = s„} = X p { S 1 = * 1 , ■ ■ •, S n = s n \X { = i}pi 

i 

= X p f s i = ■Sil-X’i = i}P{S 2 =S 2 ,...,S„= s„|5i = Si, Xi = i}pi 

i 

= y^p(.Sl|t)P{S2 = S 2 , . .., S n = S n \X\ = i } pi 
i 

= Xp( s ^Bi(i)Pi 

i 

Another approach to obtaining T’jS" = s„} is to combine both the forward and 
backward approaches. Suppose that for some k we have computed both functions 
F k (j) and B k (j). Because 

P{S n =s n ,X k = j} = P{S k = st, X k = j} 

X P { S k ( 1 = ^tr+l , - • ■ i S n = S n |S = Sk, X k — j } 

= P{S k = s k , X k = j}P{Sk +1 = st+t,- S n = s n \X k = j } 

= FkU)Bk(j) 

we see that 

P{S n = s„} = X FkU)Bk(j) 
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The beauty of using the preceding identity to determine P{ S" = s„} is that we may 
simultaneously compute the sequence of forward functions, starting with F\ , as well as 
the sequence of backward functions, starting at B n -\. The parallel computations can 
then be stopped once we have computed both F k and Ip for some k. 

4.11.1 Predicting the States 

Suppose the first n observed signals are s„ = (si, ..., s n ), and that given this data we 
want to predict the first n states of the Markov chain. The best predictor depends on 
what we are trying to accomplish. If our objective is to maximize the expected number 

of states that are correctly predicted, then for each k = 1.nwe need to compute 

P{Xk = j |S" = s,,} and then let the value of j that maximizes this quantity be the 
predictor of X k - (That is, we take the mode of the conditional probability mass function 
of Xk, given the sequence of signals, as the predictor of X k .) To do so, we must first 
compute this conditional probability mass function, which is accomplished as follows. 
For k ^ n. 


P{X k = j|S" = s„} 


P{S" = 8„, Xk = j} 
P{ S" =s„l 
Fk(j)Bk(J) 

Ej Fk(j)Bk(j) 


Thus, given that S" = s„, the optimal predictor of Xk is the value of j that maximizes 
Fk(j)Bk(j). 

A different variant of the prediction problem arises when we regard the sequence 
of states as a single entity. In this situation, our objective is to choose that sequence 
of states whose conditional probability, given the sequence of signals, is maximal. For 
instance, in signal processing, while X \ , ..., X„ might be the actual message sent, 
Si,.... S n would be what is received, and so the objective would be to predict the 
actual message in its entirety. 

Letting X* = (X \, ..., Xk) be the vector of the first k states, the problem of interest 
is to find the sequence of states i i,..., i n that maximizes P{X„ = (/],..., i„) | S" = 
s„}. Because 


P{X„ = (ii,...,i„) |S" =S„} = 


F{X„ = S» = s„} 

P{S" = S s } 


this is equivalent to finding the sequence of states i\, ..., i„ that maximizes P{X„ — 
(it, • • ■ , in), S" = s„}. 

To solve the preceding problem let, for k ^ n, 

V k (j) = max P{Xk-i = (ii,...,ik-i),X k = j,S k =Sk) 
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To recursively solve for V k ( j), use that 
Vk(j) = max max P{X k - 2 = (4,..., 4-2), X*_i = i, X k = j , S k = s k } 

= max max 4{X*_ 2 = (4, • ■ ■, 4-2), X k -\ = i, = s k -i, 

l l\ ,...,l k —2 

Xk ~ j i 

= max max 4{X*_ 2 = (4,..., 4_ 2 ), X k ^i = i, S* _1 = s*_i} 

i n,...,i k - 2 

xP{X k = j, S k = s k \X k -2 = (4,-4-2), X k -[ = i, S i_1 = S(t-i} 

= max max P{X k - 2 — (4, ..., 4- 2 ), X k -\ — i, S* _1 = s*_i} 
i n,-,‘k -2 

xP{X k = j, S k = s k \X k -i = i} 

= max P{X k = j, S k = s k \X k -\ = z) 
i 

x max P{X k -2 = (4, •• ■, 4- 2 ), X k -i = i, S* -1 = s*_i} 
i\,...,i k -2 

= max Pijp(s k \j)Vk-\(i) 

i 

= p(s k \j) max PijV k -i(i ) (4.37) 


Starting with 

Vi(j) = P{X t = j, Si = si} = pjp(si\j) 

we now use the recursive identity (4.37) to determine V 2 (y) for each j ; then V 3 (j) for 
each j; and so on, up to V n (j) for each j. 

To obtain the maximizing sequence of states, we work in the reverse direction. Let 
j n be the value (or any of the values if there are more than one) of j that maximizes 
V n (j). Thus j n is the final state of a maximizing state sequence. Also, for k < n, let 
4(j) be a value of; that maximizes 4., V k (i). Then 

max P{X n = (/4,..., i n ), S" = s„) 

= max V„(j) 
j 

= V n (jn) 

= max P{X„ = (4,..., 4_i, ;„), S" = s„} 

= P On I jn ) max 4 , jn V n _i(i) 
i 

= P On I jn ) Pi„- 1 V n -l (4-1 On)) 

Thus, 4-1 On) is the next to last state of the maximizing sequence. Continuing in this 
manner, the second from the last state of the maximizing sequence is 4 - 2 ( 4 -i 0 n))> 
and so on. 

The preceding approach to finding the most likely sequence of states given a pre¬ 
scribed sequence of signals is known as the Viterbi Algorithm. 
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Exercises 


*1. Three white and three black balls are distributed in two urns in such a way that 
each contains three balls. We say that the system is in state i,i = 0, 1, 2, 3, if the 
first urn contains i white balls. At each step, we draw one ball from each urn and 
place the ball drawn from the first urn into the second, and conversely with the 
ball from the second urn. Let X n denote the state of the system after the nth step. 
Explain why {X n , n = 0, 1, 2,...} is a Markov chain and calculate its transition 
probability matrix. 

2. Suppose that whether or not it rains today depends on previous weather conditions 
through the last three days. Show how this system may be analyzed by using a 
Markov chain. How many states are needed? 

3. In Exercise 2, suppose that if it has rained for the past three days, then it will rain 
today with probability 0.8; if it did not rain for any of the past three days, then it 
will rain today with probability 0.2; and in any other case the weather today will, 
with probability 0.6, be the same as the weather yesterday. Determine P for this 
Markov chain. 

*4. Consider a process {X n , n = 0, 1,...}, which takes on the values 0, 1, or 2. 
Suppose 


P{Xn -)-i — — i, X tl —i — i n — i, 

when n is even 


, Xq = f 0 } 


p}. 

ij' 

p", 
■ ‘j ’ 


when n is odd 


where o rfj = o^l) = 1, * = 0, 1, 2. Is {X n , n ^ 0} a Markov chain? 
If not, then show how, by enlarging the state space, we may transform it into a 
Markov chain. 

5. A Markov chain { X n , n ^ 0} with states 0, 1,2, has the transition probability 
matrix 


of- 

n -2 


I 

L 2 


0 


! 

2 -i 


If P{X o = 0} = P{X 0 = 1} = find E[X 3 ]. 

6. Let the transition probability matrix of a two-state Markov chain be given, as in 

Example 4.2, by 


P = 


p 

1 -p 

1 -p 

p 


Show by mathematical induction that 


p(») _ 


\ + \{2 P -\y 


\-\ap-D n 

\ + \(2p~D n 
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7. In Example 4.4 suppose that it has rained neither yesterday nor the day before 
yesterday. What is the probability that it will rain tomorrow? 

8. Suppose that coin 1 has probability 0.7 of coming up heads, and coin 2 has 
probability 0.6 of coming up heads. If the coin flipped today comes up heads, 
then we select coin 1 to flip tomorrow, and if it comes up tails, then we select coin 
2 to flip tomorrow. If the coin initially flipped is equally likely to be coin 1 or coin 2, 
then what is the probability that the coin flipped on the third day after the initial flip 
is coin 1? Suppose that the coin flipped on Monday comes up heads. What is the 
probability that the coin flipped on Friday of the same week also comes up heads? 

9. In a sequence of independent flips of a fair coin that comes up heads with prob¬ 
ability .6, what is the probability that there is a run of three consecutive heads 
within the first 10 flips? 

10. In Example 4.3, Gary is currently in a cheerful mood. What is the probability that 
he is not in a glum mood on any of the following three days? 

11. In Example 4.3, Gary was in a glum mood four days ago. Given that he hasn’t 
felt cheerful in a week, what is the probability he is feeling glum today? 

12. For a Markov chain {X n , n 0} with transition probabilities P, j, consider the 
conditional probability that X n — m given that the chain started at time 0 in state 
i and has not yet entered state r by time n , where r is a specified state not equal 
to either i or m. We are interested in whether this conditional probability is equal 
to the n stage transition probability of a Markov chain whose state space does not 
include state r and whose transition probabilities are 


e ‘-> = 


Pi 


Either prove the equality 


j £ r 


P{X n = m\X 0 = i, X k / r, k = 1, ..., n) = Q'\ m 


or construct a counterexample. 

13. Let P be the transition probability matrix of a Markov chain. Argue that if for some 
positive integer r, P' has all positive entries, then so does P", for all integers n Js r. 

14. Specify the classes of the following Markov chains, and determine whether they 
are transient or recurrent: 


Pi 


P 3 


ini 

O V ' o 


0 - - 


- - 0 

r, -f \J 



0 

0 

0 

1 


0 

0 

0 

1 

p 2 = 

1 

2 

1 

2 

0 

0 


0 

0 

1 

0 


M>— 

o 

N>|h- 

o 

O 


5 4 0 0 0 

i i i o 0 

4 2 4 U U 


Mi— 

Mi— 

o 

o 

o 

^0^00 

I'd 

II 

0 0 10 0 

o 

O 

O 

Mi— 

Mi— 


0 0 i § 0 

0 0 0 i i 


1 0 0 0 0 
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15. Prove that if the number of states in a Markov chain is M, and if state j can be 
reached from state then it can be reached in M steps or less. 

*16. Show that if state i is recurrent and state i does not communicate with state j, 
then Pjj = 0. This implies that once a process enters a recurrent class of states it 
can never leave that class. For this reason, a recurrent class is often referred to as 
a closed class. 

17. For the random walk of Example 4.18 use the strong law of large numbers to give 
another proof that the Markov chain is transient when p ^ ^. 

Hint: Note that the state at time n can be written as where the F, s are 

independent and P {F; = 1} = p = 1 — P{Y-, = — 1}. Argue that if p > then, 
by the strong law of large numbers, ” Y, —► oo as n — > oo and hence the initial 
state 0 can be visited only finitely often, and hence must be transient. A similar 
argument holds when p < \. 

18. Coin 1 comes up heads with probability 0.6 and coin 2 with probability 0.5. A 
coin is continually flipped until it comes up tails, at which time that coin is put 
aside and we start flipping the other one. 

(a) What proportion of flips use coin 1 ? 

(b) If we start the process with coin 1 what is the probability that coin 2 is used 
on the fifth flip? 

19. For Example 4.4, calculate the proportion of days that it rains. 

20. A transition probability matrix P is said to be doubly stochastic if the sum over 
each column equals one; that is, 

for all j 


If such a chain is irreducible and aperiodic and consists of M+l states 0, 1 ,,M, 
show that the long-run proportions are given by 


1 

71 j — , 

J M+l 


j =0, 1,...,M 


*21. A DNA nucleotide has any of four values. A standard model for a mutational 
change of the nucleotide at a specific location is a Markov chain model that sup¬ 
poses that in going from period to period the nucleotide does not change with 
probability 1 — 3a, and if it does change then it is equally likely to change to any 
of the other three values, for some 0 < a < j. 

(a) Show that P[ l j = ^ + |(1 — 4a)". 

(b) What is the long-run proportion of time the chain is in each state? 

22. Let Y n be the sum of n independent rolls of a fair die. Find 


lim P{Y n is a multiple of 13} 

fl—> OO 


Hint: Define an appropriate Markov chain and apply the results of Exercise 20. 
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23. In a good weather year the number of storms is Poisson distributed with mean 
1; in a bad year it is Poisson distributed with mean 3. Suppose that any year’s 
weather conditions depends on past years only through the previous year’s con¬ 
dition. Suppose that a good year is equally likely to be followed by either a good 
or a bad year, and that a bad year is twice as likely to be followed by a bad year 
as by a good year. Suppose that last year—call it year 0—was a good year. 

(a) Find the expected total number of storms in the next two years (that is, in 
years 1 and 2). 

(b) Find the probability there are no storms in year 3. 

(c) Find the long-run average number of storms per year. 

24. Consider three urns, one colored red, one white, and one blue. The red urn con¬ 
tains 1 red and 4 blue balls; the white urn contains 3 white balls, 2 red balls, and 
2 blue balls; the blue urn contains 4 white balls, 3 red balls, and 2 blue balls. At 
the initial stage, a ball is randomly selected from the red urn and then returned 
to that urn. At every subsequent stage, a ball is randomly selected from the urn 
whose color is the same as that of the ball previously selected and is then returned 
to that urn. In the long run, what proportion of the selected balls are red? What 
proportion are white? What proportion are blue? 

25. Each morning an individual leaves his house and goes for a run. He is equally 
likely to leave either from his front or back door. Upon leaving the house, he 
chooses a pair of running shoes (or goes running barefoot if there are no shoes at 
the door from which he departed). On his return he is equally likely to enter, and 
leave his running shoes, either by the front or back door. If he owns a total of k 
pairs of running shoes, what proportion of the time does he run barefooted? 

26. Consider the following approach to shuffling a deck of n cards. Starting with any 
initial ordering of the cards, one of the numbers 1,2, ..., n is randomly chosen in 
such a manner that each one is equally likely to be selected. If number i is chosen, 
then we take the card that is in position i and put it on top of the deck—that is, 
we put that card in position 1. We then repeatedly perform the same operation. 
Show that, in the limit, the deck is perfectly shuffled in the sense that the resultant 
ordering is equally likely to be any of the n\ possible orderings. 

*27. Each individual in a population of size N is, in each period, either active or 
inactive. If an individual is active in a period then, independent of all else, that 
individual will be active in the next period with probability a. Similarly, if an 
individual is inactive in a period then, independent of all else, that individual will 
be inactive in the next period with probability /3. Let X n denote the number of 
individuals that are active in period n. 

(a) Argue that X n . n ^ 0 is a Markov chain. 

(b) Find E[X h \Xq = i], 

(c) Derive an expression for its transition probabilities. 

(d) Find the long-run proportion of time that exactly j people are active. 

Hint for (d): Consider first the case where N — 1 . 
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28. Every time that the team wins a game, it wins its next game with probability 0.8; 
every time it loses a game, it wins its next game with probability 0.3. If the team 
wins a game, then it has dinner together with probability 0.7, whereas if the team 
loses then it has dinner together with probability 0.2. What proportion of games 
result in a team dinner? 

29. An organization has N employees where A is a large number. Each employee 
has one of three possible job classifications and changes classifications (indepen¬ 
dently) according to a Markov chain with transition probabilities 


0.7 

0.2 

0.1 

0.2 

0.6 

0.2 

0.1 

0.4 

0.5 


What percentage of employees are in each classification? 

30. Three out of every four trucks on the road are followed by a car, while only one 
out of every five cars is followed by a truck. What fraction of vehicles on the road 
are trucks? 

31. A certain town never has two sunny days in a row. Each day is classified as being 
either sunny, cloudy (but dry), or rainy. If it is sunny one day, then it is equally 
likely to be either cloudy or rainy the next day. If it is rainy or cloudy one day, 
then there is one chance in two that it will be the same the next day, and if it 
changes then it is equally likely to be either of the other two possibilities. In the 
long run, what proportion of days are sunny? What proportion are cloudy? 

*32. Each of two switches is either on or off during a day. On day n, each switch will 
independently be on with probability 

[1 + number of on switches during day n — l]/4 

For instance, if both switches are on during day n — 1, then each will indepen¬ 
dently be on during day n with probability 3/4. What fraction of days are both 
switches on? What fraction are both off? 

33. A professor continually gives exams to her students. She can give three possible 
types of exams, and her class is graded as either having done well or badly. Let /?, 
denote the probability that the class does well on a type i exam, and suppose that 
p\ — 0.3, pj = 0.6, and p-} = 0.9. If the class does well on an exam, then the 
next exam is equally likely to be any of the three types. If the class does badly, then 
the next exam is always type 1. What proportion of exams are type i, i — 1, 2, 3? 

34. A flea moves around the vertices of a triangle in the following manner: Whenever 
it is at vertex i it moves to its clockwise neighbor vertex with probability /;, and 
to the counterclockwise neighbor with probability qt = 1 — pi, i = 1, 2, 3. 

(a) Find the proportion of time that the flea is at each of the vertices. 

(b) How often does the flea make a counterclockwise move that is then followed 
by five consecutive clockwise moves? 

35. Consider a Markov chain with states 0, 1, 2, 3, 4. Suppose Po ,4 = l;and suppose 

that when the chain is in state i, i > 0, the next state is equally likely to be any 
of the states 0, 1, — 1. Find the limiting probabilities of this Markov chain. 
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36. The state of a process changes daily according to a two-state Markov chain. If 
the process is in state i during one day, then it is in state j the following day with 
probability P, j, where 

T*o,o = 0.4, 7*o,i = 0.6, A.o = 0.2, P u = 0.8 

Every day a message is sent. If the state of the Markov chain that day is i then 
the message sent is “good” with probability /?, and is “bad” with probability 

qi = 1 — pi , i = 0, 1 

(a) If the process is in state 0 on Monday, what is the probability that a good 
message is sent on Tuesday? 

(b) If the process is in state 0 on Monday, what is the probability that a good 
message is sent on Friday? 

(c) In the long run, what proportion of messages are good? 

(d) Let Y n equal 1 if a good message is sent on day n and let it equal 2 otherwise. 
Is {Y„, n ^ 1} a Markov chain? If so, give its transition probability matrix. If 
not, briefly explain why not. 

37. Show that the stationary probabilities for the Markov chain having transition 
probabilities 7*,- ,■ are also the stationary probabilities for the Markov chain whose 
transition probabilities <2;,,- are given by 

Q,i = Pij 

for any specified positive integer k. 

38. Capa plays either one or two chess games every day, with the number of games that 
she plays on successive days being a Markov chain with transition probabilities 

P\,\ = -2, A ,2 = -8 E2,i = -4, 7*2,2 = -6 

Capa wins each game with probability p. Suppose she plays two games on Mon¬ 
day. 

(a) What is the probability that she wins all the games she plays on Tuesday? 

(b) What is the expected number of games that she plays on Wednesday? 

(c) In the long run, on what proportion of days does Capa win all her games. 

39. Consider the one-dimensional symmetric random walk of Example 4.18, which 
was shown in that example to be recurrent. Let tt,- denote the long-run proportion 
of time that the chain is in state i. 

(a) Argue that n ,■ = no for all i. 

(b) Show that JL m ^ 1. 

(c) Conclude that this Markov chain is null recurrent, and thus all n ; = 0. 

40. A particle moves on 12 points situated on a circle. At each step it is equally likely 
to move one step in the clockwise or in the counterclockwise direction. Find the 
mean number of steps for the particle to return to its starting position. 
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*41. Consider a Markov chain with states equal to the nonnegative integers, and sup¬ 
pose its transition probabilities satisfy Pjj = 0 if j ^ i . Assume Xo = 0, and let 
ej be the probability that the Markov chain is ever in state j. (Note that eo = 1 
because Xq = 0.) Argue that for j > 0 


j -1 

e J = J2 e ‘ Pi -i 

i =0 

If P u+k = 1/3, k = 1,2, 3, find e t for i = 1.10. 

42. Let A be a set of states, and let A c be the remaining states. 

(a) What is the interpretation of 

E E 

ieA jeA c 


(b) What is the interpretation of 

E E 5 ^ 7 

ieA c jeA 

(c) Explain the identity 

E E = E E *«■ p i 

ieA jeA c ieA c jeA 


43. Each day, one of n possible elements is requested, the /th one with probability 
Pi, i ^ 1, 5~ , | l Pj = 1. These elements are at all times arranged in an ordered list 
that is revised as follows: The element selected is moved to the front of the list 
with the relative positions of all the other elements remaining unchanged. Define 
the state at any time to be the list ordering at that time and note that there are n\ 
possible states. 

(a) Argue that the preceding is a Markov chain. 

(b) For any state i\ , .. ., i n (which is a permutation of 1, 2 , ..., n), let 7t (i i,..., i n ) 
denote the limiting probability. In order for the state to be i\ it is nec¬ 
essary for the last request to be for i\ , the last non -/1 request for ij, the last 
non-z'i or L request for z'3, and so on. Hence, it appears intuitive that 


jr(/i, ...,/„)= P /j 


In -1 


1 - P h 1 - P h 


1 - Pi 1 


P‘,,-2 


Verify when n = 3 that the preceding are indeed the limiting probabilities. 

44. Suppose that a population consists of a fixed number, say, m, of genes in any 
generation. Each gene is one of two possible genetic types. If exactly i (of the in) 
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genes of any generation are of type 1, then the next generation will have j type 
1 (and m — j type 2) genes with probability 



j = 0, 1- ,771 


Let X n denote the number of type 1 genes in the nth generation, and assume that 
X 0 = i. 

(a) Find E[X n ], 

(b) What is the probability that eventually all the genes will be type 1? 

45. Consider an irreducible finite Markov chain with states 0, 1, ..., N. 

(a) Starting in state i, what is the probability the process will ever visit state jl 
Explain! 

(b) Let x, = Pfvisit state N before state 01start in /}. Compute a set of linear 
equations that the x; satisfy, i = 0, 1,..., N. 

(c) If j j Pij =7 for i = I..... /V — 1, show that x; = i/N is a solution to the 

equations in part (b). 

46. An individual possesses r umbrellas that he employs in going from his home to 
office, and vice versa. If he is at home (the office) at the beginning (end) of a day 
and it is raining, then he will take an umbrella with him to the office (home), pro¬ 
vided there is one to be taken. If it is not raining, then he never takes an umbrella. 
Assume that, independent of the past, it rains at the beginning (end) of a day with 
probability p. 

(a) Define a Markov chain with r + 1 states, which will help us to determine the 
proportion of time that our man gets wet. (Note'. He gets wet if it is raining, 
and all umbrellas are at his other location.) 

(b) Show that the limiting probabilities are given by 


m 


q 

r + q' 
1 

r + q ' 


if i = 0 

where q = 1 — p 

if / = ,r 


(c) What fraction of time does our man get wet? 

(d) When r — 3, what value of p maximizes the fraction of time he gets wet 

*47. Let [X„, n Js 0} denote an ergodic Markov chain with limiting probabilities jr,. 
Define the process { Y n , n 1} by Y n = ( X n _|, X n ). That is, Y n keeps track of 
the last two states of the original chain. Is { Y n , 77 ^ 1} a Markov chain? If so, 
determine its transition probabilities and find 


lim P{Y n = ( 7 , ;)} 
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48. Consider a Markov chain in steady state. Say that a k length run of zeroes ends 
at time m if 


Xm—k— 1 X m —k — X m —k +1 — - • ■ — X m _ 1 — 0, Xm ^ 0 

Show that the probability of this event is 7ro(Po,o) i _1 (1 — Po,o) 2 > where 7To is the 
limiting probability of state 0. 

49. Let P ll) and P <2] denote transition probability matrices for ergodic Markov chains 
having the same state space. Let ic 1 and it 2 denote the stationary (limiting) prob¬ 
ability vectors for the two chains. Consider a process defined as follows: 

(a) Xo = 1.A coin is then flipped and if it comes up heads, then the remain¬ 
ing states X \, ... are obtained from the transition probability matrix 
P (1) and if tails from the matrix P >2> . Is { X n , n 0[ a Markov chain? 
If p = P{coin comes up heads}, what is lim„_ j . 00 P(X n = ;)? 

(b) Xo = 1. At each stage the coin is flipped and if it comes up heads, then the 
next state is chosen according to P (1) and if tails comes up, then it is chosen 
according to P i2] . In this case do the successive states constitute a Markov 
chain? If so, determine the transition probabilities. Show by a counterexample 
that the limiting probabilities are not the same as in part (a). 

50. In Exercise 8, if today’s flip lands heads, what is the expected number of additional 
flips needed until the pattern t, t, h, t, h,t, t occurs? 

51. In Example 4.3, Gary is in a cheerful mood today. Find the expected number of 
days until he has been glum for three consecutive days. 

52. A taxi driver provides service in two zones of a city. Fares picked up in zone A will 
have destinations in zone A with probability 0.6 or in zone B with probability 0.4. 
Fares picked up in zone B will have destinations in zone A with probability 0.3 
or in zone B with probability 0.7. The driver’s expected profit for a trip entirely 
in zone A is 6; for a trip entirely in zone B is 8; and for a trip that involves both 
zones is 12. Find the taxi driver’s average profit per trip. 

53. Find the average premium received per policyholder of the insurance company of 
Example 4.27 if A. = 1/4 for one-third of its clients, and '/, = I /2 for two-thirds 
of its clients. 

54. Consider the Ehrenfest urn model in which M molecules are distributed between 
two urns, and at each time point one of the molecules is chosen at random and is 
then removed from its urn and placed in the other one. Let X„ denote the number 
of molecules in urn 1 after the «th switch and let [i„ = E[X n ], Show that 


(a) fji n+ 1 = 1 + (1 - 2/M)p, n . 

(b) Use (a) to prove that 



M-2 

M 


M 

L[X 0 ] — — 


55. Consider a population of individuals each of whom possesses two genes that can 
be either type A or type a. Suppose that in outward appearance type A is dom¬ 
inant and type a is recessive. (That is, an individual will have only the outward 
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characteristics of the recessive gene if its pair is aa.) Suppose that the population 
has stabilized, and the percentages of individuals having respective gene pairs 
AA, aa, and Aa are p, q, and r. Call an individual dominant or recessive depend¬ 
ing on the outward characteristics it exhibits. Let 5n denote the probability that 
an offspring of two dominant parents will be recessive; and let Sio denote the 
probability that the offspring of one dominant and one recessive parent will be 
recessive. Compute ,S’| i and .S'hi to show that .S| i = Sjl). (The quantities .S'io and 
.Si i are known in the genetics literature as Snyder’s ratios.) 

56. Suppose that on each play of the game a gambler either wins 1 with probability 
p or loses 1 with probability 1 — p. The gambler continues betting until she or he 
is either up n or down m. What is the probability that the gambler quits a winner? 

57. A particle moves among n + 1 vertices that are situated on a circle in the fol¬ 
lowing manner. At each step it moves one step either in the clockwise direction 
with probability p or the counterclockwise direction with probability q = 1 — p. 
Starting at a specified state, call it state 0, let T be the time of the first return to 
state 0. Find the probability that all states have been visited by time T. 

Hint: Condition on the initial transition and then use results from the gambler’s 
ruin problem. 

58. In the gambler’s ruin problem of Section 4.5.1, suppose the gambler’s fortune is 
presently i, and suppose that we know that the gambler’s fortune will eventually 
reach N (before it goes to 0). Given this information, show that the probability 
he wins the next gamble is 


p [i - (q/p) i+1 ] 
i - ( q/pY 
i 1 
2 i ’ 


if p # \ 
if P = 2 


Hint: The probability we want is 


P{X n +1 = i + \\X n = i, lim X m = N] 

m— MX) 

_ P{Xn +1 = i + 1, lim ( „ X m — N\X„ = /} 

P flint,,, X m = N\X n = i] 

59. For the gambler’s ruin model of Section 4.5.1, let M, denote the mean number 
of games that must be played until the gambler either goes broke or reaches a 
fortune of N, given that he starts with i, i = 0. I..... /V. Show that M; satisfies 


M 0 = M n = 0; Mi = 1 + pM i+ \ + qM t - 1 , i = 1,..., N — 1 


Solve these equations to obtain 

Mi = i(N - i), 

i N 1 - 


if P = 2 

(ci/pY 


q- p 


q-pl-(q/p)N’ 


if P + \ 
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60. The following is the transition probability matrix of a Markov chain with states 
1 , 2 , 3,4 

P = 

HX 0 = 1 

(a) find the probability that state 3 is entered before state 4; 

(b) find the mean number of transitions until either state 3 or state 4 is entered. 

61. Suppose in the gambler’s ruin problem that the probability of winning a bet 
depends on the gambler’s present fortune. Specifically, suppose that cq is the 
probability that the gambler wins a bet when his or her fortune is i. Given that the 
gambler’s initial fortune is i, let P(i) denote the probability that the gambler’s 
fortune reaches N before 0. 

(a) Derive a formula that relates P{i) to P(i — 1) and P(i + 1). 

(b) Using the same approach as in the gambler’s ruin problem, solve the equation 
of part (a) for P(i). 

(c) Suppose that i balls are initially in urn 1 and N — i are in urn 2, and sup¬ 
pose that at each stage one of the N balls is randomly chosen, taken from 
whichever urn it is in, and placed in the other urn. Find the probability that 
the first urn becomes empty before the second. 

*62. Consider the particle from Exercise 57. What is the expected number of steps the 
particle takes to return to the starting position? What is the probability that all 
other positions are visited before the particle returns to its starting state? 

63. For the Markov chain with states 1, 2, 3, 4 whose transition probability matrix P 
is as specified below find fy and s ,-3 for i = 1, 2, 3. 



0.4 

0.2 

0.1 

0.3 

0.1 

0.5 

0.2 

0.2 

0.3 

0.4 

0.2 

0.1 

0 

0 

0 

1 


64. Consider a branching process having /i < 1 . Show that if Xq = 1 , then the 
expected number of individuals that ever exist in this population is given by 
1/(1 - fi). What if X 0 = n ? 

65. In a branching process having Zo = 1 and // > I . prove that tto is the smallest 
positive number satisfying Equation (4.20). 

Hint: Let jr be any solution ofn = n 1 Pj. Show by mathematical induc¬ 

tion that it ^ P{X n = 0} for all n, and let n —> oc. In using the induction argue 
that 

OO 

P{X n = 0} = (P{Xn-l = 0}y Pj 
j=0 


/ .4 .3 .2 .1 \ 

.2 .2 .2 .4 

.25 .25 .5 0 

\ .2 .1 .4 .3 / 
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66 . For a branching process, calculate icq when 


(a) P 0 = 5 , P 2 = l 

(b) P 0 = l Pi = l P 2 = ^ 

(c) Pq = Pi = P 3 = 


67. At all times, an urn contains N balls—some white balls and some black balls. At 
each stage, a coin having probability p, 0 < p < 1, of landing heads is flipped. If 
heads appears, then a ball is chosen at random from the urn and is replaced by a 
white ball; if tails appears, then a ball is chosen from the urn and is replaced by a 
black ball. Let X n denote the number of white balls in the urn after the nth stage. 

(a) Is{A„,n ^ 0} a Markov chain? If so, explain why. 

(b) What are its classes? What are their periods? Are they transient or recurrent? 

(c) Compute the transition probabilities Pa¬ 
id) Let N = 2. Find the proportion of time in each state. 

(e) Based on your answer in part (d) and your intuition, guess the answer for the 
limiting probability in the general case. 

(f) Prove your guess in part (e) either by showing that Theorem (4.1) is satisfied 
or by using the results of Example 4.35. 

(g) If p — 1, what is the expected time until there are only white balls in the urn 
if initially there are i white and N — i black? 

* 68 . (a) Show that the limiting probabilities of the reversed Markov chain are the same 
as for the forward chain by showing that they satisfy the equations 



(b) Give an intuitive explanation for the result of part (a). 

69. M balls are initially distributed among m urns. At each stage one of the balls is 
selected at random, taken from whichever urn it is in, and then placed, at random, 
in one of the other M — 1 urns. Consider the Markov chain whose state at any time 
is the vector (n\, ..., n m ) where «, denotes the number of balls in urn i. Guess 
at the limiting probabilities for this Markov chain and then verify your guess and 
show at the same time that the Markov chain is time reversible. 

70. A total of m white and m black balls are distributed among two urns, with each 
urn containing m balls. At each stage, a ball is randomly selected from each urn 
and the two selected balls are interchanged. Let X n denote the number of black 
balls in urn 1 after the nth interchange. 

(a) Give the transition probabilities of the Markov chain X n , n ^ 0. 

(b) Without any computations, what do you think are the limiting probabilities 
of this chain? 

(c) Find the limiting probabilities and show that the stationary chain is time 


reversible. 
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71. It follows from Theorem 4.2 that for a time reversible Markov chain 

Pij Pjk Pki = Pik Pkj Pji , for all i, j, k 

It turns out that if the state space is finite and P, j > 0 for all i, j. then the preceding 
is also a sufficient condition for time reversibility. (That is, in this case, we need 
only check Equation (4.26) for paths from i to i that have only two intermediate 
states.) Prove this. 

Hint: Fix i and show that the equations 

7Tj Pjk = ^k Pkj 

are satisfied by jtj = cPjj / Pji , where c is chosen so that ^Uj = 1 . 

72. For a time reversible Markov chain, argue that the rate at which transitions from 
i to j to k occur must equal the rate at which transitions from k to j to i occur. 

73. Show that the Markov chain of Exercise 31 is time reversible. 

74. A group of n processors is arranged in an ordered list. When a job arrives, the first 
processor in line attempts it; if it is unsuccessful, then the next in line tries it; if it 
too is unsuccessful, then the next in line tries it, and so on. When the job is suc¬ 
cessfully processed or after all processors have been unsuccessful, the job leaves 
the system. At this point we are allowed to reorder the processors, and a new job 
appears. Suppose that we use the one-closer reordering rule, which moves the 
processor that was successful one closer to the front of the line by interchanging 
its position with the one in front of it. If all processors were unsuccessful (or if 
the processor in the first position was successful), then the ordering remains the 
same. Suppose that each time processor i attempts a job then, independently of 
anything else, it is successful with probability pi. 

(a) Define an appropriate Markov chain to analyze this model. 

(b) Show that this Markov chain is time reversible. 

(c) Find the long-run probabilities. 

75. A Markov chain is said to be a tree process if 

(i) Pij > 0 whenever Pji > 0 , 

(ii) for every pair of states i and j, i ^ j, there is a unique sequence of distinct 

states i = i"o, *1 . in- i> in — j such that 

Pi k ,i k+ 1 >0, k = 0, 1, ..., n - 1 

In other words, a Markov chain is a tree process if for every pair of distinct states 
i and j there is a unique way for the process to go from i to j without reentering 
a state (and this path is the reverse of the unique path from j to i). Argue that an 
ergodic tree process is time reversible. 

76. On a chessboard compute the expected number of plays it takes a knight, starting 
in one of the four corners of the chessboard, to return to its initial position if we 
assume that at each play it is equally likely to choose any of its legal moves. (No 
other pieces are on the board.) 

Hint: Make use of Example 4.36. 
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77. In a Markov decision problem, another criterion often used, different than the 
expected average return per unit time, is that of the expected discounted return. 
In this criterion we choose a number a, 0 < a < 1, and try to choose a policy so 
as to maximize E[Y^ZqOi' R(Xj , «,)] (that is, rewards at time n are discounted at 
rate a"). Suppose that the initial state is chosen according to the probabilities /;,. 
That is, 


P{X 0 = i) — bi, i = 1,..., n 

For a given policy /I let y ja denote the expected discounted time that the 
process is in state j and action a is chosen. That is. 


yja — Ep 


'%2 a " I {X„=j,a„=a) 

,n=0 

where for any event A the indicator variable I a is defined by 
IA = 

(a) Show that 


1, if A occurs 
0 , otherwise 


E» = E 


E“ n/ (x„= 


j 1 


L/i=0 


or, in other words, yja i s the expected discounted time in state j under fi. 

(b) Show that 


EEw-rb- 

j a 

E yja = h i + “ E E yia P ‘j 


Hint: For the second equation, use the identity 

hx n+ i=j) = E E hx n =i,a n=a }hx n+ i=j) 

i a 

Take expectations of the preceding to obtain 

E [lx, 1 + l =ji] = E E E [hx n =i,a n=a }\Pij(a) 

i a 


(c) Let {yja } be a set of numbers satisfying 


EE*. 

j a 


1 


1 — a 


E y ja = b j +«E E yia p ‘j (a) 

a i a 


(4.38) 
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Argue that y: a can be interpreted as the expected discounted time that the 
process is in state j and action a is chosen when the initial state is chosen 
according to the probabilities bj and the policy ft , given by 


Pi (a) = 


yia 

Ha yia 


is employed. 

Hint: Derive a set of equations for the expected discounted times when policy 
ft is used and show that they are equivalent to Equation (4.38). 

(d) Argue that an optimal policy with respect to the expected discounted return 
criterion can be obtained by first solving the linear program 

maximize R ( J '«)’ 

j ° 

such that = 

j a 

yj« = h J yia P U («)■ 

a i a 

yja > o, all;', a; 
and then defining the policy ft* by 

fi "«0 = A - 

Z -<ci yia 

where the y* a are the solutions of the linear program. 

78. For the Markov chain of Exercise 5, suppose that p(s\j) is the probability that 
signal 5 is emitted when the underlying Markov chain state is j, j =0, 1,2. 

(a) What proportion of emissions are signal si 

(b) What proportion of those times in which signal .s- is emitted is 0 the underlying 
state? 

79. In Example 4.43, what is the probability that the first 4 items produced are all 
acceptable? 
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The Exponential 
Distribution and the 
Poisson Process 



5.1 Introduction 

In making a mathematical model for a real-world phenomenon it is always necessary 
to make certain simplifying assumptions so as to render the mathematics tractable. On 
the other hand, however, we cannot make too many simplifying assumptions, for then 
our conclusions, obtained from the mathematical model, would not be applicable to 
the real-world situation. Thus, in short, we must make enough simplifying assumptions 
to enable us to handle the mathematics but not so many that the mathematical model 
no longer resembles the real-world phenomenon. One simplifying assumption that is 
often made is to assume that certain random variables are exponentially distributed. 
The reason for this is that the exponential distribution is both relatively easy to work 
with and is often a good approximation to the actual distribution. 

The property of the exponential distribution that makes it easy to analyze is that 
it does not deteriorate with time. By this we mean that if the lifetime of an item is 
exponentially distributed, then an item that has been in use for ten (or any number of) 
hours is as good as a new item in regards to the amount of time remaining until the 
item fails. This will be formally defined in Section 5.2 where it will be shown that the 
exponential is the only distribution that possesses this property. 

In Section 5.3 we shall study counting processes with an emphasis on a kind of 
counting process known as the Poisson process. Among other things we shall discover 
about this process is its intimate connection with the exponential distribution. 
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5.2 The Exponential Distribution 


5.2.1 Definition 

A continuous random variable X is said to have an exponential distribution with param¬ 
eter X, X > 0, if its probability density function is given by 


/(*) = 


Xe~ Xx , 

0 , 


x ^ 0 
x < 0 


or, equivalently, if its cdf is given by 


F(x)= f f ()0 dy 

J —OO 


1 — e~ Xx , x>0 

0 , x < 0 


The mean of the exponential distribution, £[X], is given by 


E[X] = 



x/(x) dx 
Xxe~ Xx dx 


Integrating by parts (u — x, dv = Xe ,x dx) yields 


E[X] = —xe 


-kx | 00 

lo 


f 


e -' x dx = - 


The moment generating function (f> ( t ) of the exponential distribution is given by 


<P(t) = E[e tx ] 


f 


e ,x Xe~ Xx dx 


X - t 


for t < X 


(5.1) 


All the moments of X can now be obtained by differentiating Equation (5.1). For 
example. 


E[X Z ] = —,<Kt) 


dt 2 


2X 


(x - o 3 

2 

}? 


f=o 


t=0 
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Consequently, 

Var(X) = E[X 2 ] - (£[X]) 2 
2 1 
“ 3? “ 37 

l 

- j? 


Example 5.1 (Exponential Random Variables and Expected Discounted Returns) 

Suppose that you are receiving rewards at randomly changing rates continuously 
throughout time. Let R(.x ) denote the random rate at which you are receiving rewards 
at time x. For a value a ^ 0, called the discount rate, the quantity 


R 



t ax R(x)dx 


represents the total discounted reward. (In certain applications, a is a continuously 
compounded interest rate, and R is the present value of the infinite flow of rewards.) 
Whereas 


E[R] = E 



e~ ax R(x)dx 



e~ ax E[R(x)] dx 


is the expected total discounted reward, we will show that it is also equal to the expected 
total reward earned up to an exponentially distributed random time with rate a. 

Let T be an exponential random variable with rate a that is independent of all the 
random variables R(x). We want to argue that 



e~ ax E[R(x)]dx = E 



R(x) dx 


To show this define for each i^Oa random variable I{x) by 


Kx) = 


1 , 

0 , 


if x 

if x > T 


and note that 


f 


R(x)dx = 



R(x)I(x ) dx 
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Thus, 


E 



R(x) dx 


r 

Jo 

/; 

L 

/; 


[/; 


R(x)I(x) dx 


E[R(x)I(x)] dx 
£’[W(x)]£[/(x)] dx 

OO 

E[R(x)]P{T ^ x}dx 

I 

e- ax E[R(x)]dx 


by independence 


Therefore, the expected total discounted reward is equal to the expected total (undis¬ 
counted) reward earned by a random time that is exponentially distributed with a rate 
equal to the discount factor. ■ 


5.2.2 Properties of the Exponential Distribution 

A random variable X is said to be without memory, or memoryless, if 

P{X >i , + f|X>r} = P{X > ,?} for all s, t ^ 0 (5.2) 


If we think of X as being the lifetime of some instrument, then Equation (5.2) states 
that the probability that the instrument lives for at least s + t hours given that it has 
survived t hours is the same as the initial probability that it lives for at least s hours. In 
other words, if the instrument is alive at time t, then the distribution of the remaining 
amount of time that it survives is the same as the original lifetime distribution; that is, 
the instrument does not remember that it has already been in use for a time t. 

The condition in Equation (5.2) is equivalent to 


P{X > s + t, X > t] 
P{X > t } 


= P{X > s} 


or 


P{X > s + t] = P{X > «}P{X > f} (5.3) 

Since Equation (5.3) is satisfied when X is exponentially distributed (for e _A(J + ? ) = 
e ' s e~' J ), it follows that exponentially distributed random variables are memoryless. 

Example 5.2 Suppose that the amount of time one spends in a bank is exponentially 
distributed with mean ten minutes, that is, A = -^y. What is the probability that a 
customer will spend more than fifteen minutes in the bank? What is the probability that 
a customer will spend more than fifteen minutes in the bank given that she is still in 
the bank after ten minutes? 
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Solution: If X represents the amount of time that the customer spends in the bank, 
then the first probability is just 

P{X > 15} = e“ 15A = e“ 3/2 » 0.220 


The second question asks for the probability that a customer who has spent ten 
minutes in the bank will have to spend at least five more minutes. However, since the 
exponential distribution does not “remember” that the customer has already spent 
ten minutes in the bank, this must equal the probability that an entering customer 
spends at least five minutes in the bank. That is, the desired probability is just 

P{X > 5} = e“ 5A = e~ 1/2 « 0.604 ■ 


Example 5.3 Consider a post office that is run by two clerks. Suppose that when 
Mr. Smith enters the system he discovers that Mr. Jones is being served by one of 
the clerks and Mr. Brown by the other. Suppose also that Mr. Smith is told that his 
service will begin as soon as either Jones or Brown leaves. If the amount of time that 
a clerk spends with a customer is exponentially distributed with mean 1 /A, what is the 
probability that, of the three customers, Mr. Smith is the last to leave the post office? 

Solution: The answer is obtained by this reasoning: Consider the time at which 
Mr. Smith first finds a free clerk. At this point either Mr. Jones or Mr. Brown would 
have just left and the other one would still be in service. However, by the lack of 
memory of the exponential, it follows that the amount of time that this other man 
(either Jones or Brown) would still have to spend in the post office is exponentially 
distributed with mean 1/A. That is, it is the same as if he were just starting his service 
at this point. Hence, by symmetry, the probability that he finishes before Smith must 
equal j. ■ 

Example 5.4 The dollar amount of damage involved in an automobile accident is 
an exponential random variable with mean 1000. Of this, the insurance company only 
pays that amount exceeding (the deductible amount of) 400. Find the expected value 
and the standard deviation of the amount the insurance company pays per accident. 

Solution: If X is the dollar amount of damage resulting from an accident, then 
the amount paid by the insurance company is (X — 400) + , (where a + is debited to 
equal a if a >0 and to equal 0 if a ^ 0). Whereas we could certainly determine 
the expected value and variance of (X — 400) + from brst principles, it is easier to 
condition on whether X exceeds 400. So, let 


1, if X > 400 
0, if X < 400 


Let Y — (X — 400) + be the amount paid. By the lack of memory property of the 
exponential, it follows that if a damage amount exceeds 400, then the amount by 
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which it exceeds it is exponential with mean 1000. Therefore, 

E[Y\I = 1] = 1000 
E[Y\I = 0] = 0 
Var(7|7 = 1) = (1000 ) 2 
Var(T|7 = 0) = 0 

which can be conveniently written as 

E[Y\I] = 10 3 7, Var(Y|7) = 10 6 7 

Because 7 is a Bernoulli random variable that is equal to 1 with probability e~ 0A , 
we obtain 

E[Y] = E[E[Y\I]] = 10 3 7s[7] = 10 3 e “ a4 % 670.32 
and, by the conditional variance formula 

Var(T) = £[Var(T|7)] + Var(£[T|7]) 

= lOV 0 ' 4 + 10 6 e“°' 4 (l - e~ 0A ) 

where the final equality used that the variance of a Bernoulli random variable with 
parameter p is p( 1 — p). Consequently, 

/Var(7) « 944.09 ■ 

It turns out that not only is the exponential distribution “memoryless,” but it is the 
unique distribution possessing this property. To see this, suppose that X is memoryless 
and let F(x) = P{X > x}. Then by Equation (5.3) it follows that 

F(s + t) = F(s)F(t) 

That is, F(x) satisfies the functional equation 

g(s +0 = g(s)g(t) 

However, it turns out that the only right continuous solution of this functional equation is 

g(x) = e~ lx * 


* This is proven as follows: If g(s + t) = g(s)g(t), then 

g ( 2 ) =g (I + I) = , 2 (I) 

\n/ \n nJ \n/ 

and repeating this yields g(m/n) = g m {\/n). Also, 

g(D = g(~ + - + ■■■ + -) =«"(-) or gf-) = (g(l)) 1/n 
\n n n J \ n) \ n) 

Hence g(m/n) = (g(\)) m ' n , which implies, since g is right continuous, that g(A:) = (g(l)) x . Since g(l) = 
(^(j)) 2 ^ 0 we obtain g(;c) = e ~^ x , where X = — log(g(l)). 
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and since a distribution function is always right continuous we must have 
F( x) = e~ Xx 


or 


F(x) = P{X < x] = 1 - 
which shows that X is exponentially distributed. 

Example 5.5 A store must decide how much of a certain commodity to order so as 
to meet next month’s demand, where that demand is assumed to have an exponential 
distribution with rate X. If the commodity costs the store c per pound, and can be sold 
at a price of s > c per pound, how much should be ordered so as to maximize the 
store’s expected profit? Assume that any inventory left over at the end of the month is 
worthless and that there is no penalty if the store cannot meet all the demand. 

Solution: Let X equal the demand. If the store orders the amount t, then the profit, 
call it P, is given by 

P — s min(X, t ) — ct 


Writing 

min(X, t) = X — (X — t) + 

we obtain, upon conditioning whether X > t and then using the lack of memory 
property of the exponential, that 

£[(X - t) + ] = E[(X - t) + \X > t]P{X > t) 

+E[(X -t)+\X ^t]P(X ^t) 

= £[(X-0 + |X > t]e~ Xt 

= I*-* 

k 

where the final equality used the lack of memory property of exponential random 
variables to conclude that, conditional on X exceeding t, the amount by which it 
exceeds it is an exponential random variable with rate X. Hence, 

ElminCX, t)] = | - |e _Af 
A A 

giving that 

s s 

E[P] = - e~ xt - ct 

XX 

Differentiation now yields that the maximal profit is attained when se~ kt — c = 0; 
that is, when 

1 

t = 7 logO/c) 

A 
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Now, suppose that all unsold inventory can be returned for the amount r < minis, c) 
per pound; and also that there is a penalty cost p per pound of unmet demand. In 
this case, using our previously derived expression for E[P], we have 

E[P] = -- -e~ x ’ -ct + rE[(t - X)+] - pE[(X - t) + ] 

X X 


Using that 

min(X, t) = t-(t- X) + 
we see that 

, 1 1 ,, 

E[(t - X) + ] = t - £[min(X, t)] = t -h -e~ Xt 

X X 


Hence, 


E[P] = - e 

XX 


-it 


r r -u P -u 

ct + rt -I —e xt - -e Xt 

XX X 


s — r r — s — p _ u 

= —— +--- Xt -{c- r)t 

A A 

Calculus now yields that the optimal amount to order is 


' = x log 


s + p — r 
c — r 


It is worth noting that the optimal amount to order increases in s, p, and r and 
decreases in X and c. (Are these monotonicity properties intuitive?) ■ 

The memoryless property is further illustrated by the failure rate function (also 
called the hazard rate function) of the exponential distribution. 

Consider a continuous positive random variable X having distribution function F 
and density /. The failure (or hazard ) rate function r(t) is defined by 


r(t) = 


fit) 

1 - F(t) 


(5.4) 


To interpret r(f), suppose that an item, having lifetime X, has survived for t hours, 
and we desire the probability that it does not survive for an additional time dt. That is, 
consider P{X e (t,t + dt)\X > t }. Now, 


P{X e (t,t + dt)\X >t} = 


P{X e (t,t + dt),X > t) 
P{X > t) 

P{X e (t,t + dt)} 

P{X > t} 
f (t) dt 


1 - F{t) 


= rit) dt 
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That is, r(t) represents the conditional probability density that a f-year-old item will 
fail. 

Suppose now that the lifetime distribution is exponential. Then, by the memoryless 
property, it follows that the distribution of remaining life for a f-year-old item is the 
same as for a new item. Hence, r(f) should be constant. This checks out since 


m = 


fit) 

1 - Fit) 


Xe~ xt 


e 


-A t 


= X 


Thus, the failure rate function for the exponential distribution is constant. The parameter 
X is often referred to as the rate of the distribution. (Note that the rate is the reciprocal 
of the mean, and vice versa.) 

It turns out that the failure rate function r(t) uniquely determines the distribution F. 
To prove this, we note by Equation (5.4) that 


r(t) = 


i , F (Q 

1 - Fit) 


Integrating both sides yields 


or 


log(l - Fit)) = 


1 - Fit) = e k exp 



Letting t = 0 shows that k — 0 and thus 


Fit) = 1 — exp 



The preceding identity can also be used to show that exponential random variables 
are the only ones that are memoryless. Because if X is memoryless, then its failure rate 
function must be constant. But if r(t) = c, then by the preceding equation 

1 - Fit) = e~ ct 


showing that the random variable is exponential. 

Example 5.6 Let X\, ..., X n be independent exponential random variables with 
respective rates A,i, ..., X„, where Xj f X ; when i f j. Let T be independent of these 
random variables and suppose that 

n 


Y, Pj = 1 where Pj = P{T = j} 
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The random variable Xj is said to be a hyperexponential random variable. To see how 
such a random variable might originate, imagine that a bin contains n different types 
of batteries, with a type j battery lasting for an exponential distributed time with rate 
Xj, j = 1, ..., n. Suppose further that Pj is the proportion of batteries in the bin that 
are type j for each j = 1,..., n. If a battery is randomly chosen, in the sense that it 
is equally likely to be any of the batteries in the bin, then the lifetime of the battery 
selected will have the hyperexponential distribution specified in the preceding. 

To obtain the distribution function F of X = Xj, condition on T. This yields 


1 - F(t ) = P{X > t} 

n 

= p { x > t\T = i}P{T = i } 


i=i 


= 

i=i 

Differentiation of the preceding yields /, the density function of X. 

n 

fit) = 


i=t 


Consequently, the failure rate function of a hyperexponential random variable is 
£"=i 


r(t) = 


£"=i Pie -'-' 1 


By noting that 


P{T = j\X>t} = 


P{X > t\T = j}P{T = j] 


P{X > t } 


Pje-'-P 


£”=i P*-* 

we see that the failure rate function r(t) can also be written as 


r(t) = Y i k J p { T = j\X>t} 
j =i 


If /. i < X,, for all i > 1, then 


P{T = 1|X > t] = 


Pie 


— k\t 


Pie-An + Pie - 1 ' 1 

Pi 

Pi + £"= 2 PierVi-W 

1 as t -> 


oo 
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Similarly, P{T = i\X > t} —> 0 when i ^ 1, thus showing that 
lim r(t) = min A,- 

t-*o o i 

That is, as a randomly chosen battery ages its failure rate converges to the failure rate of 
the exponential type having the smallest failure rate, which is intuitive since the longer 
the battery lasts, the more likely it is a battery type with the smallest failure rate. ■ 


5.2.3 Further Properties of the Exponential Distribution 

Let Xy, ... , X n be independent and identically distributed exponential random vari¬ 
ables having mean 1 /X. It follows from the results of Example 2.3Sjthat X \ -\ — ■ + X n 
has a gamma distribution with parameters n and X. Let us now give a second verification 
of this result by using mathematical induction. Because there is nothing to prove when 
n — 1 , let us start by assuming that X\ + ■ ■ ■ + X n -i has density given by 


fxi+-+x n _i (t) = Xe kr 


(Xt) n ~ 2 


(n - 2)! 


Hence, 


POO 

/xi+--+x„_i+x„(0 = fx„(.t ~ s)f Xl+ ...+x n . 1 (s)ds 

Jo 


= / Xe~ 


1 Xe - ds 

In - 2)! 


= Xe~ kt 


qr)"- 1 

(n - 1)! 


which proves the result. 

Another useful calculation is to determine the probability that one exponential 
random variable is smaller than another. That is, suppose that X\ and AT are inde¬ 
pendent exponential random variables with respective means \/X\ and i/Xx, what is 
P{Xi < A" 2 }? This probability is easily calculated by conditioning on X\: 


P{X { < X 2 } = 



P{X i < X 2 \X\ = x}X!e~ klx dx 


P{x < X 2 }Xie k>x dx 
e~ k2X X\e~ kix dx 


Xie~ (kl+k2)x dx 


X\ 


Ai + A. 2 


(5.5) 
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Suppose that X\, X 2 . X n are independent exponential random variables, with Xj 

having rate /z;, i = 1, ..., n. It turns out that the smallest of the X, is exponential with 
a rate equal to the sum of the / 1 ,. This is shown as follows: 


P{minimum(Xi, ..., X n ) > x} = P{X; > x for each i = l, ... ,n] 

n 

= J~| P{Xi > } (by independence) 

i=i 


n 

= n ■ '• 

i=t 



(5.6) 


Example 5.7 (Analyzing Greedy Algorithms for the Assignment Problem) A 

group of n people is to be assigned to a set of n jobs, with one person assigned to 
each job. For a given set of n 2 values C, j , i, j = 1,..., n, a cost C, / is incurred when 
person / is assigned to job j. The classical assignment problem is to determine the set 
of assignments that minimizes the sum of the n costs incurred. 

Rather than trying to determine the optimal assignment, let us consider two heuristic 
algorithms for solving this problem. The first heuristic is as follows. Assign person 1 
to the job that results in the least cost. That is, person 1 is assigned to job j\ where 
C(l, yi) = minimum y C(l, j). Now eliminate that job from consideration and assign 
person 2 to the job that results in the least cost. That is, person 2 is assigned to job ji 
where C(2, /N) = minimum C(2, j). This procedure is then continued until all n 
persons are assigned. Since this procedure always selects the best job for the person 
under consideration, we will call it Greedy Algorithm A. 

The second algorithm, which we call Greedy Algorithm B, is a more “global” version 
of the first greedy algorithm. It considers all n 2 cost values and chooses the pair 11 , yi 
for which C(i, j) is minimal. It then assigns person i\ to job j\ . It then eliminates all 
cost values involving either person i\ or job j\ (so that (n — l) 2 values remain) and 
continues in the same fashion. That is, at each stage it chooses the person and job that 
have the smallest cost among all the unassigned people and jobs. 

Under the assumption that the C/y constitute a set of n 2 independent exponential 
random variables each having mean 1, which of the two algorithms results in a smaller 
expected total cost? 

Solution: Suppose first that Greedy Algorithm A is employed. Let C, denote the 
cost associated with person/, i = 1,..., n. Now Cj is the minimum of/undependent 
exponentials each having rate 1; so by Equation (5.6) it will be exponential with 
rate n. Similarly, C 2 is the minimum of n — 1 independent exponentials with rate 1, 
and so is exponential with rate n — 1. Indeed, by the same reasoning C, will be 
exponential with rate « — / + !, / = 1 ,... ,n. Thus, the expected total cost under 
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Greedy Algorithm A is 


[total cost] = E[C i + • • • + C n \ 


n 


= J2 


Let us now analyze Greedy Algorithm B. Let C, be the cost of the /th person-job 
pair assigned by this algorithm. Since C\ is the minimum of all the n 2 values C, j , 
it follows from Equation (5.6) that C\ is exponential with rate n 2 . Now, it follows 
from the lack of memory property of the exponential that the amounts by which the 
other Cjj exceed C\ will be independent exponentials with rates 1. As a result, C 2 
is equal to Cj plus the minimum of (n — l ) 2 independent exponentials with rate 1 . 
Similarly, C 3 is equal to C 2 plus the minimum of (n — 2 ) 2 independent exponentials 
with rate 1, and so on. Therefore, we see that 


E[C{\ = 1 /n 2 , 

E[C 2 ] = E[C l ]+l/(n-lf, 

E[C 3 ] = E[C 2 \ + 1 /in - 2) 2 , 

E[Cj] = E[Cj-{\ + l/(n — j + l) 2 , 

E[C n ] = E[C n - 1 ] + 1 


Therefore, 


E[Ci] = 1 /n 2 , 

E[C 2 \ = l/n 2 + \/(n - l) 2 , 

E[C 3 \ = l/n 2 + 1 /(n - l) 2 + 1 /(« - 2) 2 , 

E[C n ] = l/n 2 + 1/ (n - l) 2 + l/(n - 2) 2 + ■ • • + 1 
Adding up all the E[Ci\ yields 

Lfittotal cost] = n/n 2 + (n — l)/(n — l) 2 + (n — 2)/(« — 2) 2 + • ■ ■ + 1 



i=t 


The expected cost is thus the same for both greedy algorithms. 


Let X \,..., X n be independent exponential random variables, with respective rates 
A.i, ..., X n . A useful result, generalizing Equation (5.5), is that Xj is the smallest of 
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these with probability A// ^ ■ Xj. This is shown as follows: 


Xj = min X; [ = P 


X; < min X , 
i¥=i J 




where the final equality uses Equation (5.5) along with the fact that min /7 ^ Xj is 
exponential with rate , ■ Xj. 

Another important fact is that min/ X ; and the rank ordering of the X ; are indepen¬ 
dent. To see why this is true, consider the conditional probability that X, l < Xj 2 < 
■ ■ ■ < Xj n given that the minimal value is greater than t. Because min, X, > t means that 
all the X, are greater than t, it follows from the lack of memory property of exponential 
random variables that their remaining lives beyond t remain independent exponential 
random variables with their original rates. Consequently, 


P ] A,, < • ■ ■ < X hl | min Xj > t [ = / J j A,, — ?<■••< X hl — t\ min Xj > t 

= P{X n <••• <XJ 


That is, we have proven the following. 

Proposition If X \, ..., X n are independent exponential random variables with respec¬ 
tive rates Xi,... ,X n , then min,- X, is exponential with rate ^;=t ' A i- Further, min,- X, 
and the rank order of the variables X\,... ,X n are independent. 

Example 5.8 Suppose you arrive at a post office having two clerks at a moment when 
both are busy but there is no one else waiting in line. You will enter service when either 
clerk becomes free. If service times for clerk i are exponential with rate X,, i = 1, 2, 
find E[T], where T is the amount of time that you spend in the post office. 

Solution: Let R , denote the remaining service time of the customer with clerk i,i = 
1, 2, and note, by the lack of memory property of exponentials, that R i and AS are 
independent exponential random variables with respective rates kj and iC. Condi¬ 
tioning on which of R[ or AS is the smallest yields 

E[T] = E[T\R[ < R 2 \P{R\ < R 2 ) + E[T\R 2 ^ R X ]P{R 2 < R y } 

= E[T\R l < R 2 ] — —j—— + E[T\R 2 < R X ]—^— 

X i —t - X 2 X\ + x 2 

Now, with S denoting your service time 

£[T|tf i < R 2 ] = E[R[ + S\Ri < R 2 ] 

= ElR^Ri <R 2 ] + E[S\R l <R 2 ] 

= E[R l \R l <R 2 ] + ±- 
Xi 

1 1 


A.i + X 2 X\ 
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The final equation used that conditional on < R 2 the random variable R\ is the 
minimum of R\ and R 2 and is thus exponential with rate A| + A 2 ; and also that 
conditional on R\ < AS you are served by server 1. 

As we can show in a similar fashion that 

E[T\R 2 < Ril = —J— + y- 

At + A 2 A 2 


we obtain the result 


E[T] = 


3 

Xi + A. 2 


Another way to obtain E[T] is to write T as a sum, take expectations, and then 
condition where needed. This approach yields 


E[T] = E[mm(R l ,R 2 ) + S] 

= E[min(R l ,R 2 )] + E[S] 

E[S] 


X\ + X 2 

To compute £[5], we condition on which of R\ and R 2 is smallest. 


E[S] = E[S\Ri < R 2 ] 


A1 


Ai + X 2 


E[S\R 2 ^ ^ 1 ] 


x 2 


Ai + X 2 


X\ + x 2 


Example 5.9 There are n cells in the body, of which cells I. k are target cells. 

Associated with each cell is a weight, with wi being the weight associated with 
cell i, i = 1The cells are destroyed one at a time in a random order, which is 
such that if S is the current set of surviving cells then, independent of the order in which 
the cells not in S have been destroyed, the next cell killed is i, i e S, with probability 
Wj j J2 jc.s w i■ ' n other words, the probability that a given surviving cell is the next 
one to be killed is the weight of that cell divided by the sum of the weights of all still 
surviving cells. Let A denote the total number of cells that are still alive at the moment 
when all the cells 1,2 ,,k have been killed, and find £[A]. 

Solution: Although it would be quite difficult to solve this problem by a direct 
combinatorial argument, a nice solution can be obtained by relating the order in 
which cells are killed to a ranking of independent exponential random variables. 
To do so, let X[,, X n be independent exponential random variables, with A, 
having rate Wj, i = l, ... ,n. Note that X, will be the smallest of these exponentials 
with probability w, / JN wy, further, given that A,- is the smallest, A,- will be the 
next smallest with probability W J ’ f urt her, given that A, and X r are, 

respectively, the first and second smaliest, A v , .v ^ i, r, will be the third smallest 
with probability w s / YL j^i r w i ■ anc l so on - Consequently, if we let I j be the index 
of the jth smallest of X \, ..., X n —so that A/j < A/ 0 < • • • < A j n —then the order 
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in which the cells are destroyed has the same distribution as I\ . So, let us 

suppose that the order in which the cells are killed is determined by the ordering of 
Xi,... ,X n . (Equivalently, we can suppose that all cells will eventually be killed, 
with cell; being killed at time Xj, i = 1, ..., n.) 

If we let A j equal 1 if cell j is still alive at the moment when all the cells 1 ,,k 
have been killed, and let it equal 0 otherwise, then 

A= t A J 

j=k+\ 


Because cell j will be alive at the moment when all the cells 1 , ... ,k have been 
killed if Xj is larger than all the values X \, ..., Xk, we see that for j > k 

E[Aj] = P{Aj = 1} 

= P { Xj > max Xj} 


= / P\Xj> max Xj\Xj—x\wje 

Jo l «'=1 I 

= f 


°J X clx 


P{Xj < x for all i = 1,..., k } uije WjX dx 

/>oo k 

= J Y[{\- e - WiX )w j e- w i x dx 


i=l 


r l k 

in- 


> Wi/w J)dy 


where the final equality follows from the substitution y — e WjX . Thus, we obtain 
the result 

n j £ J Tl tc 

£[A]= f Y\(\-y WilWi )dy= f J2 HO -y w ‘ ,Wj )dy ■ 

j=k +1 i=l j=k +1 i=l 


Example 5.10 Suppose that customers are in line to receive service that is provided 
sequentially by a server; whenever a service is completed, the next person in line enters 
the service facility. However, each waiting customer will only wait an exponentially 
distributed time with rate 0; if its service has not yet begun by this time then it will 
immediately depart the system. These exponential times, one for each waiting customer, 
are independent. In addition, the service times are independent exponential random 
variables with rate /x. Suppose that someone is presently being served and consider the 
person who is nth in line. 

(a) Find P n , the probability that this customer is eventually served. 

(b) Find W n , the conditional expected amount of time this person spends waiting in 
line given that she is eventually served. 
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Solution: Consider the n + 1 random variables consisting of the remaining service 
time of the person in service along with the n additional exponential departure times 
with rate 6 of the first n in line. 

(a) Given that the smallest of these n +1 independent exponentials is the departure 
time of the nth person in line, the conditional probability that this person will be 
served is 0; on the other hand, given that this person’s departure time is not the 
smallest, the conditional probability that this person will be served is the same as if 
it were initially in position n — 1. Since the probability that a given departure time 
is the smallest of the n + 1 exponentials is 0/{nd + pi), we obtain 


Pn = 


{n - 1)0 + jx 


nd 




Pn -1 


Using the preceding with n — 1 replacing n gives 


(n - 1)6 + n (n - 2)6 + u (n - 2)6 + u 

1 I] —— in — 9 i »— 2 

nd + /I (n-1)6 +pi, nd + ii 

Continuing in this fashion yields the result 

6 + a a 

Pn = - —Pi = — - 

nd + [i nd+ii 

(b) To determine an expression for W n , we use the fact that the minimum of 
independent exponentials is, independent of their rank ordering, exponential with a 
rate equal to the sum of the rates. Since the time until the nth person in line enters 
service is the minimum of these n + 1 random variables plus the additional time 
thereafter, we see, upon using the lack of memory property of exponential random 
variables, that 


W„ 


1 


n6 + i± 


W „-1 


Repeating the preceding argument with successively smaller values of n yields the 
solution 


i=i 


i 


id + fi 


5.2.4 Convolutions of Exponential Random Variables 

Let X,,i = 1, ..., n, be independent exponential random variables with respective 
rates /.;, i = 1and suppose that /., ^ for i ^ j. The random variable 
Yfi -1 X,- is said to be a hypoexponentia! random variable. To compute its probability 
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density function, let us start with the case n — 2. Now, 

fx l +x 2 {t)= / fx l (s)fx 2 {t - s)ds 

Jo 

= f X ie - x ' s X 2 e~ X2(, ~ s) ds 


f 




Xi 


M — x 2 

Xi 


X 2 e~ k2 '(\-e 


-(A.1-A2)/ 


X 2 e~ k2t + —^— X x e~ kl ’ 


— X 2 X 2 — Ai 

Using the preceding, a similar computation yields, when n — 3, 

fx ] +x 2 +x 3 (t) = X! Xi ' e ~ A ' f (n x ■ - A■ ) 


i = l j^i ' 1 1 

which suggests the general result 

n 

/x 1 +-+x„(0 = YCj,„Xie-* 


1=1 


where 


C i,n - ]~[ 


. A; A; 
j¥=‘ 1 

We will now prove the preceding formula by induction on n. Since we have already 
established it for n — 2, assume it for n and consider n + 1 arbitrary independent 
exponentials X, with distinct rates A;, i = 1+ 1. If necessary, renumber X \ and 
sothatA„_|_i < Ai.Now, 


fxi+-+x n+ i(t) = f fx i+-+x„0)A„+ie A " +l(f s) 
Jo 


ds 


n r , 

= X>,« 

i= i J ° 
n 

= E Ct ’ n 


X ie - kiS X n+l e- k ''+ l(t - s) ds 


7 = 1 


A; 




+ . Xn+ \ x ie - k ‘ { 


^n +1 1 

n 

= Kn+i^n+i? A " +[l + Ci M +iXje k,t 


(5.7) 


i=t 


where K n +\ = YH= 1 Ci,nAi/(A; — A ;i +i) is a constant that does not depend on t. But, 
we also have that 


fx l +-+x„ +l (t) = f fx 2 +-+x n+l (s)X ie Al(/ 

Jo 


ds 
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which implies, by the same argument that resulted in Equation (5.7), that for a constant 
K\ 


n +1 

fxi+-+x n+ i(t) — K\X\e A| ' + Y t Ci'„ + i\je k,t 

1=2 

Equating these two expressions for fx x -\ _ \-x n+l (t) yields 

Kn+iK+ie-^ + C hn+1 Xie~ klt = K^e^ 1 ' + C„+^n+iK+ie^ 1 * 

Multiplying both sides of the preceding equation by e k " +lt and then letting f — > oc 
yields [since g-(*i-An+i)f -» 0 as t -> oo] 

Kn+l = f ); ) 1.?; ) t 

and this, using Equation (5.7), completes the induction proof. Thus, we have shown 
that if S = Y11=i X;, then 


n 

i=i 


(5.8) 


where 


C L 


n 


Xj - A.,- 


Integrating both sides of the expression for fs from t to oo yields that the tail distribution 
function of S is given by 


n 

P{S>t} = Y C i.n e ~ Xit 

i'=l 


(5.9) 


Hence, we obtain from Equations (5.8) and (5.9) that the failure rate function 
of S, is as follows: 


rs(0 = 


E/=i CLnhe- k ‘> 

i:UCi,ne- k ‘> 


If we let Aj = min(/,i,..., X n ), then it follows, upon multiplying the numerator and 
denominator of r$(t) by e k P, that 


lim r s {t) — Xj 

t— MX) 

From the preceding, we can conclude that the remaining lifetime of a hypoexponentially 
distributed item that has survived to age t is, for t large, approximately that of an 
exponentially distributed random variable with a rate equal to the minimum of the rates 
of the random variables whose sums make up the hypoexponential. 
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Remark Although 



it should not be thought that the Ci_ n , i = 1are probabilities, because some of 
them will be negative. Thus, while the form of the hypoexponential density is similar 
to that of the hyperexponential density (see Example 5.6) these two random variables 
are very different. 

Example 5.11 Let X\, ..., X m be independent exponential random variables with 
respective rates Ai, ..., X m , where /,, ^ Xj when i ^ j. Let N be independent of 
these random variables and suppose that Yl'n=\ p n = 1. where P n = P{N = n ) . The 
random variable 

N 

Y = E X J 

7 = 1 

is said to be a Coxian random variable. Conditioning on N gives its density function: 


m 

Mt)= £/r (*\ N = n ) p n 

n =1 
m 

= £/x 1 + -+x„(f \ N = n)P n 

n =1 
m 

= y: fxi+-+xn (t) p n 

n =1 

m n 

n =1 i=l 


Let 


r(n ) = P{N = n\N Js n } 


If we interpret A as a lifetime measured in discrete time periods, then r(n ) denotes the 
probability that an item will die in its «th period of use given that it has survived up to 
that time. Thus, r(n) is the discrete time analog of the failure rate function r(t). and is 
correspondingly referred to as the discrete time failure (or hazard) rate function. 
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Coxian random variables often arise in the following manner. Suppose that an item 
must go through m stages of treatment to be cured. However, suppose that after each 
stage there is a probability that the item will quit the program. If we suppose that 
the amounts of time that it takes the item to pass through the successive stages are 
independent exponential random variables, and that the probability that an item that 
has just completed stage n quits the program is (independent of how long it took to 
go through the n stages) equal to r(n), then the total time that an item spends in the 
program is a Coxian random variable. ■ 


5.3 The Poisson Process 

5.3.1 Counting Processes 

A stochastic process {N(t), t ^ 0} is said to be a counting process if N(t) represents 
the total number of “events” that occur by time t. Some examples of counting processes 
are the following: 

(a) If we let N(t) equal the number of persons who enter a particular store at or prior 
to time f, then {N(t), t ^ 0} is a counting process in which an event corresponds 
to a person entering the store. Note that if we had let N(t ) equal the number of 
persons in the store at time t, then {N(t), t 0} would not be a counting process 
(why not?). 

(b) If we say that an event occurs whenever a child is born, then {N(t), 1 ^ 0} is a 
counting process when N{t) equals the total number of people who were born by 
time t. (Does N(t) include persons who have died by time t ? Explain why it must.) 

(c) If N(t) equals the number of goals that a given soccer player scores by time t, then 
{N(t), t ^ 0} is a counting process. An event of this process will occur whenever 
the soccer player scores a goal. 

From its definition we see that for a counting process N(t) must satisfy: 

(i) N{t) > 0. 

(ii) N (?) is integer valued. 

(iii) If s < t, then N(s) ^ N(t). 

(iv) For s < t, N(t) — N(s) equals the number of events that occur in the interval 

(s, t], 

A counting process is said to possess independent increments if the numbers of 
events that occur in disjoint time intervals are independent. For example, this means 
that the number of events that occur by time 10 (that is, A(10)) must be independent 
of the number of events that occur between times 10 and 15 (that is, N( 15) — /V (10)). 

The assumption of independent increments might be reasonable for example (a), 
but it probably would be unreasonable for example (b). The reason for this is that if in 
example (b) N(t) is very large, then it is probable that there are many people alive at 
time t ; this would lead us to believe that the number of new births between time t and 
time 1 + s would also tend to be large (that is, it does not seem reasonable that N(t) 



298 


Introduction to Probability Models 


is independent of N(t + 5 ) — N(t), and so {N(t), t ^ 0} would not have independent 
increments in example (b)). The assumption of independent increments in example (c) 
would be justified if we believed that the soccer player’s chances of scoring a goal 
today do not depend on “how he’s been going.” It would not be justified if we believed 
in “hot streaks” or “slumps.” 

A counting process is said to possess stationary increments if the distribution of the 
number of events that occur in any interval of time depends only on the length of the 
time interval. In other words, the process has stationary increments if the number of 
events in the interval (s, s + t) has the same distribution for all .v. 

The assumption of stationary increments would only be reasonable in example (a) 
if there were no times of day at which people were more likely to enter the store. 
Thus, for instance, if there was a rush hour (say, between 12 P.M. and 1 P.M.) each day, 
then the stationarity assumption would not be justified. If we believed that the earth’s 
population is basically constant (a belief not held at present by most scientists), then 
the assumption of stationary increments might be reasonable in example (b). Stationary 
increments do not seem to be a reasonable assumption in example (c) since, for one 
thing, most people would agree that the soccer player would probably score more goals 
while in the age bracket 25-30 than he would while in the age bracket 35—40. It may, 
however, be reasonable over a smaller time horizon, such as one year. 


5.3.2 Definition of the Poisson Process 


One of the most important types of counting process is the Poisson process. As a prelude 
to giving its definition, we define the concept of a function /(■) being o(h). 

Definition 5.1 The function /(•) is said to be o(h ) if 


r /<*> n 
ltm -= 0 

h^O h 

Example 5.12 

(a) The function f(x) — x 2 is o(h ) since 

f(h) h 2 

lim : -= ltm — = lim h — 0 

h —>0 h h —>0 h h —>0 

(b) The function f(x) — x is not o(h ) since 

f (h) h 

lim —- = lim - = lim 1 = 1 + 0 

/;->0 h /;->0 h 0 

(c) If /(■) is o(h) and g(-) is o(h), then so is /(■) + g(-). This follows since 

f(h) + g(h) m iV g{h) 

lim -= ltm-b hm -= 0 + 0 = 0 

h ->0 h ft-s-0 h h-*- 0 h 


(d) If /(•) is o(/?), then so is g(-) = cf(-). This follows since 
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(e) From (c) and (d) it follows that any finite linear combination of functions, each of 
which is oih), is oih). ■ 

In order for the function /(•) to be oih) it is necessary that f(h)/h go to zero as h 
goes to zero. But if /? goes to zero, the only way for f(h)/h to go to zero is for f(h) to 
go to zero faster than h does. That is, for h small, f(h) must be small compared with li. 

The o(h ) notation can be used to make statements more precise. For instance, if 
X is continuous with density / and failure rate function k(t), then the approximate 
statements 


P{t < X < t + h) & f{t) h 
Pit < X < t + h\X > t) « X(t) h 

can be precisely expressed as 

Pit < X < t + h) — fit) h + o)h) 

Pit < X < t + h\X > t) — X(t) h + oih) 

We are now in position to define the Poisson process. 

Definition 5.2 The counting process {Nit), t ^ 0} is said to be a Poisson process 
with rate X > 0 if the following axioms hold: 

(i) N(0) = 0 

(ii) {Nf), t f 0} has independent increments 

(iii) PiNit + h) — Nit) = 1 ) = Xh + oih) 

(iv) PiNit + h) - Nit) 3s 2) = oih) 

The preceding is called a Poisson process because the number of events in any interval 
of length t is Poisson distributed with mean Xt, as is shown by the following important 
theorem. 

Theorem 5.1 If {N{t),t ^ 0} is a Poisson process with rate X > 0, then for all 
s > 0, t > 0, Nis + t) — N(s) is a Poisson random variable with mean Xt. That is, the 
number of events in any interval of length t is a Poisson random variable with mean Xt. 

Proof. We begin by deriving E[e~ uN ^], the Laplace transform of Nf). To do so, 
fix m > 0 and define 

git) = E{e- uN(,) ] 

We will obtain git) by deriving a differential equation as follows. 

git + h) = E{e~ uN<t+h) ] 

= E[ e ~ u(N(,)+N(,+h) ~ N(t) ] 

= E[e~ uN(,) c —«(JVC»+*)—JV(0)-| 

_ u(N{t+h)—N(t ))j 

(by independent increments) 

= g{t) E [e-< N ^- N ^{ 


(5.10) 
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Now, from Axioms (iii) and (iv) 

P{N(t + h)~ N(t ) = 0} = 1 -Xh + o(h) 

P{N(t + h ) - N(t) = 1} = Xh + o(h ) 

P{N(t + h) - N(t) > 2} = o(h) 

Conditioning on which of these three possibilities occurs gives that 
£[ e -»[JV(r+/0-iV(f)]] = i _ xh + o(h) + e~ u (Xh + o(h)) + o(h) 

= 1 - Xh + e~ u Xh + o(h) (5.11) 

Therefore, from Equations (5.10) and (5.1 1) we obtain 
g(t + h) = g(t)(l + Xh(e~ u - 1) + o(h)) 

which can be written as 

e(t + h) — e(t) „ o(h) 

* S = g(t)X(e~ u - 1) + 

h h 

Letting h —>■ 0 yields the differential equation 
g'(t) = g(t)X(e- u - 1) 


or 


g'(t) 

g(t) 


= X(e 


1 ) 


Noting that the left side is the derivative of log(g(f)) yields, upon integration, that 
logQKO) = X(e~ u -l)t + C 

Because g(0) = £’[e _wA,(0 ^] = 1 it follows that C = 0, and so the Laplace transform 
of N(t) is 

E[e~ uN(,) ] = g(t) = e Xt(e ~ U ~ v > 


However, if A is a Poisson random variable with mean Xt, then its Laplace transform is 

E[e~ uX ] =Y^ e ~ ld e~ X, (Xty /i\ 
i 

= e~ Xr (Xte~ u )'/i\ = e~ Xt e Xte ~ U = e Xt(e ~ u ~ l) 

i 

Because the Laplace transform uniquely determines the distribution, we can thus con¬ 
clude that N(t) is Poisson with mean Xt. 

To show that N(s + t) — N(s) is also Poisson with mean Xt, fix s and let N s (t) = 
N(s + t ) — N (,v) equal the number of events in the first t time units when we start 
our count at time s. It is now straightforward to verify that the counting process 
\N s (t), r L 0} satisfies all the axioms for being a Poisson process with rate X. Conse¬ 
quently, by our preceding result, we can conclude that N s ( t ) is Poisson distributed with 
mean Xt. ■ 
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Remarks 


(i) The result that N(t), or more generally N(t + s) — N(s), has a Poisson distribution 
is a consequence of the Poisson approximation to the binomial distribution (see 
Section 2.2.4). To see this, subdivide the interval [0, t] into k equal parts where k 
is very large (Figure 5.1). Now it can be shown using axiom (iv) of Definition 5.2 
that as k increases to oo the probability of having two or more events in any of 
the k subintervals goes to 0. Hence, N{t) will (with a probability going to 1) just 
equal the number of subintervals in which an event occurs. However, by stationary 
and independent increments this number will have a binomial distribution with 
parameters k and p = Xt/k + o(t/k). Hence, by the Poisson approximation to the 
binomial we see by letting k approach oo that N(t) will have a Poisson distribution 
with mean equal to 



= Xt 


by using the definition of o(h) and the fact that t/k —> 0 as k -* oo. 

(ii) Because the distribution of N(t + s) — N(s) is the same for all s, it follows that 
the Poisson process has stationary increments. 

5.3.3 Interarrival and Waiting Time Distributions 

Consider a Poisson process, and let us denote the time of the first event by T\. Further, 
for n > 1, let T n denote the elapsed time between the (n — 1) st and the «th event. The 
sequence {T n , n = 1, 2, ...} is called the sequence of interarrival times. For instance, 
if 7j =5 and T 2 = 10, then the first event of the Poisson process would have occurred 
at time 5 and the second at time 15. 

We shall now determine the distribution of the T n . To do so, we first note that the 
event {7j > t] takes place if and only if no events of the Poisson process occur in the 
interval [0, f] and thus, 


P{Ti >t} = P{N(t ) = 0} = e~ 1 ' 

Hence, 7j has an exponential distribution with mean 1 //.. Now, 
P{T 2 >t} = E[P{T 2 > r|7j}] 

However, 

P{T 2 > t | 7j = i} = P{ 0 events in (s, s + t] \ 7j = s} 


P{0 events in (s, s + f]} 


(5.12) 


0 


1 5L f 

k k 



Figure 5.1 
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where the last two equations followed from independent and stationary increments. 
Therefore, from Equation (5.12) we conclude that T 2 is also an exponential random 
variable with mean l/X and, furthermore, that T 2 is independent of T\. Repeating the 
same argument yields the following. 

Proposition 5.1 T„,n — 1,2,..., are independent identically distributed exponential 
random variables having mean 1 /X. 

Remark The proposition should not surprise us. The assumption of stationary and 
independent increments is basically equivalent to asserting that, at any point in time, 
the process probabilistically restarts itself. That is, the process from any point on is 
independent of all that has previously occurred (by independent increments), and also 
has the same distribution as the original process (by stationary increments). In other 
words, the process has no memory, and hence exponential interarrival times are to be 
expected. 

Another quantity of interest is S n , the arrival time of the nth event, also called the 
waiting time until the nth event. It is easily seen that 

n 

S n = Y. Tj , n > 1 
; = 1 


and hence from Proposition 5.1 and the results of Section 2.2 it follows that S n has a 
gamma distribution with parameters n and X. That is, the probability density of S n is 
given by 

fsjt) = Xe- >J (ktf ] t^O (5.13) 

(n - 1)! 

Equation (5.13) may also be derived by noting that the nth event will occur prior to or 
at time t if and only if the number of events occurring by time t is at least n. That is, 

N(t) > n 0 S„ ^ t 


Hence, 


F Sn {t) = P{S n ^t} = P{N(t)>n} 


S>-“ 


( a .ty 

j'- 


which, upon differentiation, yields 


j=n 


{xty_ 

j'- 


+ T. ke ~ u 

j=n 


qty~ l 
U - D! 


(n - 1)! 


- £; 

O' - O' 

J=n+1 


j= n 


<M) j 


(n - D! 
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Example 5.13 Suppose that people immigrate into a territory at a Poisson rate X = 1 
per day. 

(a) What is the expected time until the tenth immigrant arrives? 

(b) What is the probability that the elapsed time between the tenth and the eleventh 
arrival exceeds two days? 

Solution: 

(a) EtSio] = 10/1 = 10 days. 

(b) P{T n > 2} = e~ 2X = e“ 2 » 0.133. ■ 

Proposition 5.1 also gives us another way of defining a Poisson process. Suppose 
we start with a sequence { T n , n ^ 1} of independent identically distributed exponential 
random variables each having mean 1 /X. Now let us define a counting process by saying 
that the nth event of this process occurs at time 

Sn = T\ + 72 + ■ • • + T n 

The resultant counting process {IV(f), t ^ 0}* will be Poisson with rate X. 

Remark Another way of obtaining the density function of S n is to note that because 
S n is the time of the nth event, 

P{t < S n < t + h] — P{N(t) = n — 1, one event in (r, t + h)} + o(h) 

= P{N(t ) = n — l}P{one event in ( t , t + h)} + o(h) 

= e ~ U r^-^[Xh + om + o(h) 

(n - 1)! 

(It)"- 1 

= Xe ~ kt ; h + o{h) 

(n - 1)! 


where the first equality uses the fact that the probability of 2 or more events in (t,t + h) 
is o(h). If we now divide both sides of the preceding equation by h and then let h —> 0, 
we obtain 


fsjt) = Xe~ xt 


ao"~' 

(n - D! 


5.3.4 Further Properties of Poisson Processes 

Consider a Poisson process {N (r), t ^ 0} having rate X, and suppose that each time an 
event occurs it is classified as either a type I or a type II event. Suppose further that each 
event is classified as a type I event with probability p or a type II event with probability 
1 — p, independently of all other events. For example, suppose that customers arrive 
at a store in accordance with a Poisson process having rate 1; and suppose that each 
arrival is male with probability ^ and female with probability j. Then a type I event 
would correspond to a male arrival and a type II event to a female arrival. 


A formal definition of N(t ) is given by N(t) = max{n: S n ^ t} where Sq = 0. 
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Let N\(t) and Njit) denote respectively the number of type I and type II events 
occurring in [0, f], Note that N(t) — N[(t ) + N 2 (t). 

Proposition 5.2 {N[(t),t Js 0} and {/VSL), t ^ 0} are both Poisson processes having 
respective rates kp and k (1 — p). Furthermore, the two processes are independent. 

Proof. It is easy to verify that {N\(t), t ^ 0} is a Poisson process with rate kp by 
verifying that it satisfies Definition 5.3. 

• N] (0) = 0 follows from the fact that N (0) = 0. 

• It is easy to see that {N\ it ), t ^ 0} inherits the stationary and independent increment 
properties of the process {N(t),t Js 0}. This is true because the distribution of 
the number of type I events in an interval can be obtained by conditioning on the 
number of events in that interval, and the distribution of this latter quantity depends 
only on the length of the interval and is independent of what has occurred in any 
nonoverlapping interval. 

. P{Ni(h) = 1} = P{Ni(h) = 1 | N(h) = 1 }P{N(h) = 1} 

+P{N l (h) = 1 | N(h) k> 2 }P{N(h) kf 2} 

= p(kh + o(h)) + o(h ) 

= kph + oih) 

• P{Nm > 2} < P{N(h) 2 2} = o(h) 

Thus we see that {A^i (r), t ^ 0} is a Poisson process with rate kp and, by a similar 
argument, that {A^ 2 (?), t ^ 0} is a Poisson process with rate /.(1 — p). Because the 
probability of a type I event in the interval from t to t + h is independent of all that 
occurs in intervals that do not overlap (f, t + h), it is independent of knowledge of 
when type II events occur, showing that the two Poisson processes are independent. 
(For another way of proving independence, see Example 3.23.) ■ 

Example 5.14 If immigrants to area A arrive at a Poisson rate of ten per week, and if 
each immigrant is of English descent with probability , then what is the probability 
that no people of English descent will emigrate to area A during the month of February? 

Solution: By the previous proposition it follows that the number of Englishmen 
emigrating to area A during the month of February is Poisson distributed with mean 
4 • 10 • Y 2 = t • Hence, the desired probability is e -10 / 3 . ■ 

Example 5.15 Suppose nonnegative offers to buy an item that you want to sell arrive 
according to a Poisson process with rate k. Assume that each offer is the value of a 
continuous random variable having density function f(x). Once the offer is presented 
to you, you must either accept it or reject it and wait for the next offer. We suppose that 
you incur costs at a rate c per unit time until the item is sold, and that your objective is 
to maximize your expected total return, where the total return is equal to the amount 
received minus the total cost incurred. Suppose you employ the policy of accepting the 
first offer that is greater than some specified value y. (Such a type of policy, which we 
call a y-policy, can be shown to be optimal.) What is the best value of y? What is the 
maximal expected net return? 
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Solution: Let us compute the expected total return when you use the v-policy, and 
then choose the value of y that maximizes this quantity. Let X denote the value of 
a random offer, and let F(x) = P{X > a'} = f f(u)du be its tail distribution 
function. Because each offer will be greater than y with probability F(y), it follows 
that such offers occur according to a Poisson process with rate XF(y). Hence, the 
time until an offer is accepted is an exponential random variable with rate XF(y). 
Letting R(y) denote the total return from the policy that accepts the first offer that 
is greater than y, we have 

E[/?(y)] = ^[accepted offer] — c £ [time to accept] 



/“ xf{x) dx — c/X 


(5.14) 


F(y) 


Differentiation yields 


Therefore, the optimal value of y satisfies 



f(y) = o 



or 



or 



It is not difficult to show that there is a unique value of y that satisfies the preceding. 
Hence, the optimal policy is the one that accepts the first offer that is greater than 
y*, where y* is such that 
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Putting y — y* in Equation (5.14) shows that the maximal expected net return is 


1 f c 

E[R(y*)] = / 

F(y*) J y * 


(.x — y* + y*) f(x) dx — c/X) 


1 r 

F(y*)J y * 


F(y*) 

1 

F(y*) 

y* 


(x - y*) f(x)dx + y 


f 


f (x) dx — c/X ) 


(c/X + y* F(y*) — c/X) 


Thus, the optimal critical value is also the maximal expected net return. To understand 
why this is so, let m be the maximal expected net return, and note that when an offer 
is rejected the problem basically starts anew and so the maximal expected additional 
net return from then on is m. But this implies that it is optimal to accept an offer 
if and only if it is at least as large as m, showing that m is the optimal critical 
value. ■ 

It follows from Proposition 5.2 that if each of a Poisson number of individuals is 
independently classified into one of two possible groups with respective probabilities p 
and 1 — p, then the number of individuals in each of the two groups will be independent 
Poisson random variables. Because this result easily generalizes to the case where the 
classification is into any one of r possible groups, we have the following application 
to a model of employees moving about in an organization. 

Example 5.16 Consider a system in which individuals at any time are classified as 
being in one of r possible states, and assume that an individual changes states in accor¬ 
dance with a Markov chain having transition probabilities Pjj, i, j = 1. r. That 

is, if an individual is in state i during a time period then, independently of its previous 
states, it will be in state j during the next time period with probability Pjj. The individ¬ 
uals are assumed to move through the system independently of each other. Suppose that 
the numbers of people initially in states 1,2 ,,r are independent Poisson random 
variables with respective means X\,X 2 ,... ,X r . We are interested in determining the 
joint distribution of the numbers of individuals in states 1 , 2 , ..., r at some time n. 

Solution: For fixed i, let Nj(i), j = I..... r denote the number of those indi¬ 
viduals, initially in state i, that are in state j at time n. Now each of the (Poisson 
distributed) number of people initially in state i will, independently of each other, 
be in state j at time n with probability P"., where P" is the n -stage transition 
probability for the Markov chain having transition probabilities Pjj. Hence, the 
Nj(i), j — 1 ,..., r will be independent Poisson random variables with respective 
means A.; P/', / = 1 ,,r. Because the sum of independent Poisson random vari¬ 
ables is itself a Poisson random variable, it follows that the number of individuals 
in state j at time n —namely AC (i )—will be independent Poisson random 

variables with respective means A, P", for j = \,..., r . ■ 

Example 5.17 (The Coupon Collecting Problem) There are m different types of 
coupons. Each time a person collects a coupon it is, independently of ones previously 
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obtained, a type j coupon with probability pj , Y^'j =1 Pj = 1 ■ Let N denote the number 
of coupons one needs to collect in order to have a complete collection of at least one 
of each type. Find is [IV]. 

Solution: If we let Nj denote the number one must collect to obtain a type j coupon, 
then we can express N as 

N = max Ni 
1 

However, even though each Nj is geometric with parameter p j , the foregoing rep¬ 
resentation of N is not that useful, because the random variables Nj are not inde¬ 
pendent. 

We can, however, transform the problem into one of determining the expected 
value of the maximum of independent random variables. To do so, suppose that 
coupons are collected at times chosen according to a Poisson process with rate 
a = 1. Say that an event of this Poisson process is of type j, 1 ^ j ^ m, if the 
coupon obtained at that time is a type j coupon. If we now let Nj (?) denote the 
number of type j coupons collected by time t, then it follows from Proposition 5.2 
that [IV; (f), t ^ 0}, j = l,... ,m are independent Poisson processes with respective 
rates Xpj = pj. Let Xj denote the time of the first event of the jth process, and let 

X = max X j 
1 

denote the time at which a complete collection is amassed. Since the X j are inde¬ 
pendent exponential random variables with respective rates pj, it follows that 

P{X <t}= P{maxi^j^ tn Xj < t} 

= P{X j < t, for j = 1,..., m) 

m 

= n d - 

j =i 

Therefore, 

rOO 

E[X] = / P{X > t] dt 

Jo 

m 

i - n (i - e ~ pit) 

;=i 

It remains to relate £[X], the expected time until one has a complete set, to /f[/V], 
the expected number of coupons it takes. This can be done by letting T, denote the i th 
interarrival time of the Poisson process that counts the number of coupons obtained. 
Then it is easy to see that 

N 

* = I> 

i'=l 



dt 


(5.15) 




308 


Introduction to Probability Models 


Since the 7/ are independent exponentials with rate 1, and N is independent of the 
T, , we see that 

£[X|1V] = NE[Ti] = N 

Therefore, 


E[X] = E[N] 


and so EffV] is as given in Equation (5.15). 

Let us now compute the expected number of types that appear only once in the 
complete collection. Letting /, equal 1 if there is only a single type i coupon in the 
final set, and letting it equal 0 otherwise, we thus want 


E 


E* 


Li'=l 


e™ 

i=1 
m 

E^ = 1 } 

i=t 


Now there will be a single type i coupon in the final set if a coupon of each type has 
appeared before the second coupon of type i is obtained. Thus, letting 5, denote the 
time at which the second type i coupon is obtained, we have 


P{Ii = 1} = P{Xj < Si, for all j / i] 


Using that Si has a gamma distribution with parameters (2, pi), this yields 


P{Ii = 1} 



< Sj for all j ^ i\Sj — x}pie PiX pixdx 

< x, for all j ^ i}pj' x e~ PiX dx 
- e- p J x )pfxe- p - x dx 


Therefore, we have the result 


E 7 ' 

,z=i j 


r oo m 

J E n (1 “ e~ PiX )pfxe~ PiX dx 


«'=i i+' 


r oo m m 

-L 1 no-*-”*) Erfr: 


PiX 


e~P‘ x 


dx 


The next probability calculation related to Poisson processes that we shall determine 
is the probability that n events occur in one Poisson process before m events have 
occurred in a second and independent Poisson process. More formally let { N\ (t ), t ^ 0} 
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and {N 2 (t),t f 0} be two independent Poisson processes having respective rates X] 
and A. 2 . Also, let 5,] denote the time of the nth event of the first process, and S~ n the 
time of the with event of the second process. We seek 

^ <S 2 m ] 


Before attempting to calculate this for general n and m, let us consider the special 
case n = m = 1. Since .S'|, the time of the first event of the N\ (?) process, and S j\ the 
time of the first event of the Njit) process, are both exponentially distributed random 
variables (by Proposition 5.1) with respective means l/X\ and I /X 2 , it follows from 
Section 5.2.3 that 


p|s|<sr| 


M + A. 2 


(5.16) 


Let us now consider the probability that two events occur in the N\ (?) process before 
a single event has occurred in the Niit) process. That is, P[S\ < ) . To calculate 

this we reason as follows: In order for the N\ (?) process to have two events before a 
single event occurs in the N 2 (t) process, it is first necessary for the initial event that 
occurs to be an event of the A?i(?) process (and this occurs, by Equation (5.16), with 
probability X\/{X\ + X 2 )). Now, given that the initial event is from the N\ (?) process, 
the next thing that must occur for S] to be less than Sj is for the second event also to 
be an event of the N\ (?) process. However, when the first event occurs both processes 
start all over again (by the memoryless property of Poisson processes) and hence this 
conditional probability is also X\/(X\ + X 2 )', thus, the desired probability is given by 


p|s;<s;| 


— b —) 2 

Xi + X2J 


In fact, this reasoning shows that each event that occurs is going to be an event 
of the A?i(?) process with probability A. 1 / (A. 1 + X2) or an event of the A?2(?) process 
with probability A. 2 /(Xj + X 2 ), independent of all that has previously occurred. In other 
words, the probability that the N\ (?) process reaches n before the A^f?) process reaches 
m is just the probability that n heads will appear before m tails if one flips a coin having 
probability p = X\ /(X\ + X 2 ) of a head appearing. But by noting that this event will 
occur if and only if the first n + m — 1 tosses result in n or more heads, we see that our 
desired probability is given by 



5.3.5 Conditional Distribution of the Arrival Times 

Suppose we are told that exactly one event of a Poisson process has taken place by time ?, 
and we are asked to determine the distribution of the time at which the event occurred. 
Now, since a Poisson process possesses stationary and independent increments it seems 
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reasonable that each interval in [ 0 , t] of equal length should have the same probability 
of containing the event. In other words, the time of the event should be uniformly 
distributed over [0, f]. This is easily checked since, for ,y -<J t. 


P{T { < s\N(t) = 1} = 


P{T\ < s,N(t ) = 1} 

P[N(t) = 1} 

P{1 event in [0, .?), 0 events in [s, f]} 
P{N(t)= 1} 

P{1 event in [0, ,?)}P{0 events in [ 5 , f]} 


P{N(f) = 1} 


kse Xs e ^ s ^ 


kte 


-kt 


This result may be generalized, but before doing so we need to introduce the concept 
of order statistics. 

Let Ti, 7 2 , ■ • ■, Y, , be n random variables. We say that 7(i), 7( 2 ), ..., Y in) are the 
order statistics corresponding to Y\, Y 2 , ..., Y n if Y(k) is the kth smallest value among 
Y\,... ,Y n , k = 1,2,...,«. For instance, if n = 3 and Y\ = 4, Y 2 = 5, T 3 = 1 then 
K(|) = 1, Yq,) = 4, T( 3 ) = 5. If the K,, i = \..... n, are independent identically dis¬ 
tributed continuous random variables with probability density /, then the joint density 
of the order statistics T(i), T( 2 ), ..., Y( n) is given by 


n 

f(yi, yi,■■■, y n ) = n \]~[ /(y,), yi < T 2 < ■ ■ • < y n 

1=1 

The preceding follows since 

(i) (Y(i), 7 ( 2 ),..., 7 (n) ) will equal (yi, y 2 ,..., y„) if (7 lt 72 ,..., 7„) is equal to any 
of the n\ permutations of (yi, y 2 ,..., y n ); 

and 

(ii) the probability density that (7i, 7 2 , ..., 7„)isequaltoy ;i ,.... y,- n is UU /(y.,) = 
UU / (yj ) when ii..... in is a permutation of 1,2, .... n. 

If the Yj, i — \..... n, are uniformly distributed over (0, t), then we obtain from 
the preceding that the joint density function of the order statistics 7(i), 7( 2 ), ..., Y( n) 
is 

n ! 

f(yi, yi, ■ ■ ■, yn) = —, o < yi < y 2 < • ■ • < y n < t 

We are now ready for the following useful theorem. 

Theorem 5.2 Given that N(t) = n, the n arrival times Sj, ..., S n have the same 
distribution as the order statistics corresponding to n independent random variables 
uniformly distributed on the interval ( 0 , t). 
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Proof. To obtain the conditional density of 5 1 1 , .... S n given that N (t) — n note that 
for 0 < si < • • • < s n < t the event that Si = s\, S 2 = S 2 , ■ ■ ■, S n = s, u N(t ) = n 
is equivalent to the event that the first n + 1 interarrival times satisfy 7j = $ 1 , T 2 = 
S 2 — s\,..., T n = s„ — s„- 1 , r „+1 > t — s„. Hence, using Proposition 5.1, we have 
that the conditional joint density of Si,..., S n given that N(t) = n is as follows: 


f(s 1 , ...,s n | n) = 


f(s 1 , 

P{N(t) = n] 

^-(^2 ^1) . .. h(s n s n —1)g A .{t Sn) 


n\ 

1 "' 


0 < ^i < ■ ■ ■ < s n < t 


which proves the result. 


Remark The preceding result is usually paraphrased as stating that, under the condi¬ 
tion that n events have occurred in (0, t), the times Si,S n at which events occur, 
considered as unordered random variables, are distributed independently and uniformly 
in the interval (0, t). 

Application of Theorem 5.2 (Sampling a Poisson Process) In Proposition 5.2 we 
showed that if each event of a Poisson process is independently classified as a type I 
event with probability p and as a type II event with probability 1 — p then the counting 
processes of type I and type II events are independent Poisson processes with respective 
rates Xp and /. (1 — p). Suppose now, however, that there are k possible types of events 
and that the probability that an event is classified as a type i event, i = 1 ,,k, depends 
on the time the event occurs. Specifically, suppose that if an event occurs at time y then 
it will be classified as a type i event, independently of anything that has previously 

occurred, with probability Pj(y), i = 1. k where Pi(y) = 1- Upon using 

Theorem 5.2 we can prove the following useful proposition. 

Proposition 5.3 If IV; (f), i = 1. k, represents the number of type i events occur¬ 

ring by time t then /V,(f), i = 1 ,,k, are independent Poisson random variables 
having means 


E[Nj(t)] = X f Pj(s) ds 

Jo 

Before proving this proposition, let us first illustrate its use. 

Example 5.18 (An Infinite Server Queue) Suppose that customers arrive at a service 
station in accordance with a Poisson process with rate X. Upon arrival the customer is 
immediately served by one of an infinite number of possible servers, and the service 
times are assumed to be independent with a common distribution G. What is the 
distribution of X(t), the number of customers that have completed service by time tl 
What is the distribution of Y ( t ), the number of customers that are being served at time tl 
To answer the preceding questions let us agree to call an entering customer a type I 
customer if he completes his service by time t and a type II customer if he does not 
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complete his service by time t. Now, if the customer enters at time s, s ^ t, then he 
will be a type I customer if his service time is less than t — s. Since the service time 
distribution is G, the probability of this will be G(t — s). Similarly, a customer entering 
at time s, s ^ t, will be a type II customer with probability G(t — s) = 1 — G(t — s). 
Hence, from Proposition 5.3 we obtain that the distribution of X{t), the number of 
customers that have completed service by time t, is Poisson distributed with mean 

G(t-s)ds = xf G(y)dy (5.17) 

Jo 

Similarly, the distribution of Y(t), the number of customers being served at time t is 
Poisson with mean 

E[Y(t)] = xf G(t — s) ds — X f G(y)dy (5.18) 

Jo Jo 

Furthermore, X(t) and Y(t) are independent. 

Suppose now that we are interested in computing the joint distribution of Y (t) and 
Y(t + s )—that is, the joint distribution of the number in the system at time t and at 
time t + s. To accomplish this, say that an arrival is 

type 1: if he arrives before time t and completes service between t and t + s, 
type 2: if he arrives before t and completes service after t + s, 
type 3: if he arrives between t and t + s and completes service after t + s, 
type 4: otherwise. 

Hence, an arrival at time y will be type i with probability /( (y) given by 


£[X(r)] = x f 
Jo 


P\ (y) = 


G(t + s - y) - G(t - y), 

0 , 


if y < t 
otherwise 


Pi(y) = 


G(t + s - y), 

0 , 


if y < t 
otherwise 


Pi(y) = 


G(t + s - y), 

0 , 


ifr<y<f + s 
otherwise 


p*(y) = 1 - A(y) - P 2 (y) - p 3 (y) 


Thus, if Nj = Ni(s + t ), i = 1, 2, 3, denotes the number of type i events that occur, 
then from Proposition 5.3, N,, i = 1, 2, 3, are independent Poisson random variables 
with respective means 


E[Ni] = X 



Pi(y)dy, 


i = 1,2, 3 


Because 


Y(t) = N i + N 2 , 
Y(t + S ) = N 2 + N 3 
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it is now an easy matter to compute the joint distribution of Y ( t ) and Y(t + ,v). For 
instance, 

Cov|Y(f), Y(t + .?)] 

= Cov(iVj + N 2 , N 2 + IV 3 ) 

— Cov(JV 2 , N 2 ) by independence of N\, N 2 , N 3 
= Var(fV 2 ) 

— X I G(t + s — y) dy = X I G(u + s)du 
Jo ' Jo 

where the last equality follows since the variance of a Poisson random variable equals 
its mean, and from the substitution u = t — y. Also, the joint distribution of Y (t) and 
Y{t + s) is as follows: 

P{Y(t) = i, Y(t + s) = j) = P{N { + N 2 = i, N 2 + Ni = j) 

min (i,j) 

= P ^ 2 =l,N 1 = i- /, N 3 = j- /} 

1=0 

min (/,;') 

= P{N 2 = l}P{N l =i -1}P{N 3 = j -1} ■ 

1=0 

Example 5.19 (A One Lane Road with No Overtaking) Consider a one lane road 
with a single entrance and a single exit point which are of distance L from each other 
(See Figure 5.2). Suppose that cars enter this road according to a Poisson process with 
rate X, and that each entering car has an attached random value V which represents the 
velocity at which the car will travel, with the proviso that whenever the car encounters 
a slower moving car it must decrease its speed to that of the slower moving car. Let V; 
denote the velocity value of the i th car to enter the road, and suppose that V), i ^ 1 
are independent and identically distributed and, in addition, are independent of the 
counting process of cars entering the road. Assuming that the road is empty at time 0, 
we are interested in determining 

(a) the probability mass function of R(t), the number of cars on the road at time r; and 

(b) the distribution of the road traversal time of a car that enters the road at time y. 

Solution: Let 7) = L/Vj denote the time it would take car i to travel the road if 
it were empty when car i arrived. Call 7) the free travel time of car i , and note that 
7), 72 ,... are independent with distribution function 

GO c) = P(Ti ^ x) = P(L/Vi < *) = P(Vi ^ L/x) 

Let us say that an event occurs each time that a car enters the road. Also, let t be 
a fixed value, and say that an event that occurs at time s is a type 1 event if both 


a b 


Figure 5.2 Cars enter at point a and depart at b. 
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s ^ t and the free travel time of the car entering the road at time s exceeds t — s. 
In other words, a car entering the road is a type 1 event if the car would be on the 
road at time t even if the road were empty when it entered. Note that, independent 
of all that occurred prior to time s, an event occurring at time s is a type 1 event with 
probability 


Pis) = 


G(t — s), 

0 , 


if s ^ t 
if s > t 


Letting N\ (v) denote the number of type 1 events that occur by time y, it follows 
from Proposition 5.3 that N\ (y) is, for y < t, a Poisson random variable with mean 

E[N\ (y)] = X f G(t — s)ds, y^t 
Jo 

Because there will be no cars on the road at time t if and only if N\ ( t ) = 0, it follows 
that 


P(R(t) = 0) = P(Ni(t) = 0) = e ~ k J'o C:(, ~ s)ds = e - k HGMdu 

To determine P(R(t ) = n) for n > 0 we will condition on when the first type 1 
event occurs. With X equal to the time of the first type 1 event (or to oo if there are 
no type 1 events), its distribution function is obtained by noting that 

X < y O All (y) > 0 

thus showing that 

F x (y) = P(X < y) = P(Ni(.y) > 0) = 1 - e ~ k ^ Git ~ s)ds , y < t 
Differentiating gives the density function of X: 

f x {y)^XG{t-y)e- x ^^ t - s)ds , y^t 
To use the identity 

P(R(t) = ri) = f P(R(t) = n\X = y ) fx(y)dy (5.19) 

Jo 

note that if X = y ^ t then the leading car that is on the road at time t entered at 
time v. Because all other cars that arrive between y and t will also be on the road 
at time t, it follows that, conditional on X = y. the number of cars on the road at 
time t will be distributed as 1 plus a Poisson random variable with mean '/.(t — y). 
Therefore, for n > 0 


L -Ut-y) Gd-y ))"- 1 

(« —D! 

I 0 ’ 


P(R(t) = n\X = y) = 


if y < t 
if y — oo 
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Substituting this into Equation (5.19) yields 


P(R(t) = n)= f e 


Jo 



X fo G(tS)dS ^ 


(b) Let T be the free travel time of the car that enters the road at time y, and let A(y) 
be its actual travel time. To determine P(A(y ) < x), let t = y + x and note that 
A(y) will be less than x if and only if both T < x and there have been no type 1 
events (using t = y + x) before time y. That is, 

A(v) < x T < x, N\ (y) = 0 

Because T is independent of what has occurred prior to time y, the preceding gives 

P(A(y) <x)= P(T < x)P{Ni(y) = 0) 

= G(x)e~ k JaG(y+x- s )ds 



Example 5.20 (Tracking the Number of HIY Infections) There is a relatively long 
incubation period from the time when an individual becomes infected with the HIV 
virus, which causes AIDS, until the symptoms of the disease appear. As a result, it 
is difficult for public health officials to be certain of the number of members of the 
population that are infected at any given time. We will now present a first approximation 
model for this phenomenon, which can be used to obtain a rough estimate of the number 
of infected individuals. 

Let us suppose that individuals contract the HIV virus in accordance with a Poisson 
process whose rate X is unknown. Suppose that the time from when an individual 
becomes infected until symptoms of the disease appear is a random variable having 
a known distribution G. Suppose also that the incubation times of different infected 
individuals are independent. 

Let Ni(t) denote the number of individuals who have shown symptoms of the 
disease by time t. Also, let ALL) denote the number who are HIV positive but have 
not yet shown any symptoms by time t. Now, since an individual who contracts the 
virus at time s will have symptoms by time t with probability G(t — s) and will not 
with probability G{t — s), it follows from Proposition 5.3 that ALL) and ALL) are 
independent Poisson random variables with respective means 



and 



Now, if we knew X, then we could use it to estimate ALL), the number of individuals 
infected but without any outward symptoms at time t, by its mean value E[ALL)L 
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However, since X is unknown, we must first estimate it. Now, we will presumably 
know the value of N\ ( t ), and so we can use its known value as an estimate of its mean 
E[N i (t)]. That is, if the number of individuals who have exhibited symptoms by time t 
is hi, then we can estimate that 

hi ss £[fVi(f)] = A. f G(v) dy 
Jo 


Therefore, we can estimate X by the quantity X given by 


X 



G(y) dy 


Using this estimate of X, we can estimate the number of infected but symptomless 
individuals at time t by 


estimate of Ah (f) = 


A- (' G(y)dy 
Jo 


»i fo G(y) dy 
/o' Giy )dy 


For example, suppose that G is exponential with mean //. Then G(y) — e - V/,/ ' , and a 
simple integration gives that 


estimate of Ni ( t) — 


«iyu.(1 — e Gv) 
t — jx{\ — 


If we suppose that t = 16 years, // = 10 years, and n \ = 220 thousand, then the 
estimate of the number of infected but symptomless individuals at time 16 is 


estimate = 


2,200(1 -e“ 16 ) 
16- 10(1 -e~ L6 ) 


218.96 


That is, if we suppose that the foregoing model is approximately correct (and we should 
be aware that the assumption of a constant infection rate X that is unchanging over time is 
almost certainly a weak point of the model), then if the incubation period is exponential 
with mean 10 years and if the total number of individuals who have exhibited AIDS 
symptoms during the first 16 years of the epidemic is 220 thousand, then we can expect 
that approximately 219 thousand individuals are HIV positive though symptomless at 
time 16. ■ 

Proof of Proposition 5.3 Let us compute the joint probability P{Nj(t) = h, , i = 
1, To do so note first that in order for there to have been h, type i events for 

i = 1 , ,k there must have been a total of JV =1 n, events. Hence, conditioning on 
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N(t) yields 


P{Ni(t) = «i,..., N k (t) = nk } 



Now consider an arbitrary event that occurred in the interval [0, t]. If it had occurred 
at time s, then the probability that it would be a type i event would be P, (s). Hence, 
since by Theorem 5.2 this event will have occurred at some time uniformly distributed 
on [0, t], it follows that the probability that this event will be a type i event is 

Pi = - f Pi(s)ds 
t Jo 

independently of the other events. Hence, 

P | IV,- (t) = n,-, i = 1,..., k | N(t) = m, J 

will just equal the multinomial probability of n, type i outcomes for i = 1 ,,k 
when each of n > independent trials results in outcome i with probability Pi , i = 
1,..., k. That is, 

P \N\(t) = n\,..., Nk(t) = n k \ N(t) = >>,■} = ^ =1 p” 1 • • • P£ k 

1 “ J ni\---n k \ 1 k 


Consequently, 


P[N l (t)=m,...,N k (t) = n k } 

= pni P "k -to 

k 

= n e- k,Pi (ktPi) n ‘/ni\ 
i=1 

and the proof is complete. ■ 

We now present some additional examples of the usefulness of Theorem 5.2. 

Example 5.21 Insurance claims are made at times distributed according to a Poisson 
process with rate A.; the successive claim amounts are independent random variables 
having distribution G with mean /i. and are independent of the claim arrival times. 







318 


Introduction to Probability Models 


Let S, and C; denote, respectively, the time and the amount of the ;th claim. The total 
discounted cost of all claims made up to time t, call it D(t), is defined by 

N(t) 

D(t ) = J2 e ~ aSi Ci 
1=1 


where a is the discount rate and N(t) is the number of claims made by time t. To 
determine the expected value of D(t), we condition on N(t) to obtain 

00 (Xt)" 

E[D(t)] = V E[D(t)\N(t) = n]e~ uK -L- 
z —' n\ 

n =0 


Now, conditional on N(t) = n, the claim arrival times Si,, S„ are distributed as 
the ordered values U( i),..., U {n) of n independent uniform (0, t) random variables 


U\, ... ,U n . Therefore, 


E[D(t)\N(t) =n] = E 


C i e~ 0lUu) 


L/=l 


n 

= Y E ^ C i e ~ aU ^ 

i =1 


n 

1=1 


where the final equality used the independence of the claim amounts and their arrival 
times. Because E[Ci] = ji, continuing the preceding gives 


E[D(t)\N(t) = ri] = nY E[e~ aU(i) ] 


'■= 

1 

[iE 

Ye-* u <f> 

. 1=1 

fiE 

iL e ~ au - 

. 1=1 


The last equality follows because C/(i), ..., U( nj arc the values U \. ..., U n in increasing 
order, and so J^" =1 e~ aU ^ — YTi =1 e~ aUi ■ Continuing the string of equalities yields 

£■[£>(01^(0 = n] = niiE[e~ aU ] 
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Therefore, 


E[D(t)\N(t)] = A^(0 —(1 - 
at 

Taking expectations yields the result 

£[£)(0]= —(l-e-“ r ) ■ 

a 


Example 5.22 (An Optimization Example) Suppose that items arrive at a processing 
plant in accordance with a Poisson process with rate A. At a fixed time T, all items are 
dispatched from the system. The problem is to choose an intermediate time, t e (0, T), 
at which all items in the system are dispatched, so as to minimize the total expected 
wait of all items. 

If we dispatch at time t, 0 < t < T , then the expected total wait of all items will be 

Xt 2 | X(T — t) 2 
~2~~ H 2 


To see why this is true, we reason as follows: The expected number of arrivals in (0, t ) 
is Xt, and each arrival is uniformly distributed on (0, t ), and hence has expected wait 
t / 2. Thus, the expected total wait of items arriving in (0, t) is Xt 2 12. Similar reasoning 
holds for arrivals in (t, T), and the preceding follows. To minimize this quantity, we 
differentiate with respect to t to obtain 


d 

dt 



(T-tf - 

2 


= Xt — X(T — t ) 


and equating to 0 shows that the dispatch time that minimizes the expected total wait 
isf = T/2. ■ 

We end this section with a result, quite similar in spirit to Theorem 5.2, which states 
that given S„, the time of the /zth event, then the first n — 1 event times are distributed as 
the ordered values of a set of n — 1 random variables uniformly distributed on (0, S„). 

Proposition 5.4 Given that S n = t , the set Si, ..., S„_] has the distribution of a set 
of n — 1 independent uniform (0, t) random variables. 

Proof. We can prove the preceding in the same manner as we did Theorem 5.2, or we 
can argue more loosely as follows: 


Si,-S„_1 I S n = t ~ Si,..., S„_1 | Sn = t, N(t ) = n - 1 

~ Si,...,S„_i \N(D = n-l 


where ~ means “has the same distribution as” and t is infinitesimally smaller than t. 
The result now follows from Theorem 5.2. ■ 
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5.3.6 Estimating Software Reliability 

When a new computer software package is developed, a testing procedure is often put 
into effect to eliminate the faults, or bugs, in the package. One common procedure is to 
try the package on a set of well-known problems to see if any errors result. This goes 
on for some fixed time, with all resulting errors being noted. Then the testing stops and 
the package is carefully checked to determine the specific bugs that were responsible 
for the observed errors. The package is then altered to remove these bugs. Because we 
cannot be certain that all the bugs in the package have been eliminated, however, a 
problem of great importance is the estimation of the error rate of the revised software 
package. 

To model the preceding, let us suppose that initially the package contains an unknown 
number, m, of bugs, which we will refer to as bug 1, bug 2, , bug m. Suppose also 

that bug i will cause errors to occur in accordance with a Poisson process having an 
unknown rate /,;, i = 1 ,... ,m. Then, for instance, the number of errors due to bug i 
that occurs in any s units of operating time is Poisson distributed with mean X(S. Also 

suppose that these Poisson processes caused by bugs i,i = I . m are independent. 

In addition, suppose that the package is to be run for t time units with all resulting 
errors being noted. At the end of this time a careful check of the package is made to 
determine the specific bugs that caused the errors (that is, a debugging, takes place). 
These bugs are removed, and the problem is then to determine the error rate for the 
revised package. 

If we let 

, 1 , if bug i has not caused an error by t 

dr, (t) = ■ ’ . . 

y 0, otherwise 

then the quantity we wish to estimate is 


A (0 = ^ Xj fi (t) 

i 

the error rate of the final package. To start, note that 

£[A(f)]= J2 X ‘E[fi(.t)] 

i 

= (5.20) 

i 

Now each of the bugs that is discovered would have been responsible for a certain 
number of errors. Let us denote by Mj(t) the number of bugs that were responsible 
for j errors, /LI. That is, M i (t) is the number of bugs that caused exactly one error, 
M 2 (t) is the number that caused two errors, and so on, with ; jMj(t) equaling the 
total number of errors that resulted. To compute E[M\(t)], let us define the indicator 
variables, /; ( t ), i L 1, by 


//(?) = 


1, bug i causes exactly 1 error 
0, otherwise 
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Then, 


Mj(?) = £]/;(?) 


and so 


E[M l {t)] = Y J EUi{t)] = Y J X 


te~ k “ 


Thus, from (5.20) and (5.21) we obtain the intriguing result that 

Mi(ty 


A(?) — 


= 0 


(5.21) 


(5.22) 


Thus suggests the possible use of Mi (?)/? as an estimate of A(?). To determine whether 
or not M\(t)/t constitutes a “good” estimate of A (?) we shall look at how far apart 
these two quantities tend to be. That is, we will compute 


Mi(?)\ 2 1 / M\{t)\ 

A (?) - —= Var l A(?) - — 


from (5.22) 


= Var(A(?)) - -Cov(A(?), Mj(?)) + -!rVar(/V/i (?)) 
? ? 2 


Now, 


Var(A(?)) = ^k 2 Var(^(?)) = E^”^ 1 




), 


Var(Mj (?)) = £ Var(/ { (?)) = E^”^ 1 - yte~ ki ‘), 

i i 

Cov(A(?), Mj(?)) = Cov(y>,>,■(?), EW 

= EE Cov(EtA/(0, //(0) 

i j 

= ^A.iCov(^-(f),/f(0) 




■-te Kl 


where the last two equalities follow since i/q (?) and //(?) are independent when ; y j 
because they refer to different Poisson processes and i/r, (?)/,- (?) = 0. Hence, we obtain 


A(?) — 


Mi (?) 


= E^ e_M + 7E^ e “ v 

i i 

E[M\ (?) + 2M 2 (?)] 
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where the last equality follows from (5.21) and the identity (which we leave as an 
exercise) 

E[M 2 (t)]= ^(A,-/) 2 ^' 

i 

Thus, we can estimate the average square of the difference between A(f) and M\ ( t )/f 
by the observed value of M\ ( t ) + IMiit) divided by t 2 . 

Example 5.23 Suppose that in 100 units of operating time 20 bugs are discovered of 
which two resulted in exactly one, and three resulted in exactly two, errors. Then we 
would estimate that A (100) is something akin to the value of a random variable whose 
mean is equal to 1/50 and whose variance is equal to 8/10,000. ■ 


5.4 Generalizations of the Poisson Process 

5.4.1 Nonhomogeneous Poisson Process 

In this section we consider two generalizations of the Poisson process. The first of 
these is the nonhomogeneous, also called the nonstationary, Poisson process, which is 
obtained by allowing the arrival rate at time t to be a function of t. 

Definition 5.3 The counting process {N(t), t ^ 0} is said to be a nonhomogeneous 
Poisson process with intensity function X(t), t ^ 0, if 

(i) A(0) = 0. 

(ii) {N(t), t f 0} has independent increments. 

(iii) P{N(t + h) — N(t) > 2} = o(h). 

(iv) P{N(t + h) — N(t) = 1} = X{t)h + o(h). 

The function m (t) defined by 

m(t)= f X(y) dy 
Jo 

is called the mean value function of the nonhomogeneous Poisson process, for reasons 
indicated in the following important theorem. 

Theorem 5.3 If {N(t),t f 0} is a nonstationary Poisson process with intensity 

function X(t), t f 0, then N(t + s) — N(s) is a Poisson random variable with mean 

m(t + s') — m(s) = f'^ +s X(y) dy. 

Proof. We first show that N(t) is Poisson with mean m(t) by mimicking the proof 
of Theorem 5.1 for the stationary Poisson process. Letting g(t) = P[e^ l,N(t> ] and 
following the exact steps of that proof leads us to the equation 


-«#,(*)] 


g(t + h) = g(t) E[e 



The Exponential Distribution and the Poisson Process 


323 


where N t (h) — N(t + h ) — N(t). Using that P(N t (h) = 0) = 1 — X(t)h + o(h), we 
obtain from Axioms (iii) and (iv) upon conditioning on whether N t (h) is 0, 1, or ^ 2, 
that 


git + h ) = g(f)(l - X(t)h + e u X(t)h + o(h )) 

Hence, 

git + h ) - git) = git)X(t)ie~ u - 1 )h + o(h) 

Dividing by h and letting h —> 0 yields the differential equation 
g\t) = g(t)X(t)(e- u - 1) 
which can be written as 
g'(t) 


g(t) 


= X(t)(e~ u - 1) 


Integrating both sides from 0 to t gives 

log(g(0) - log(g(0)) = (e~ u - 1) / X(t)dt 




rt 

Using that g(0) = 1 and that j () X(t)dt = m(t) the preceding shows that 
g(t) = exp {m(t)(e~ u - 1)} 

Thus E[e~ l,N ^], the Laplace transform of N(t), is exp{m(f)(e“" — 1)}. Because the 
latter is the Laplace transform of a Poisson random variable with mean m ( t ) it follows 
that N(t) is Poisson with mean m(t). The proposition now follows by noting that, with 
N s (t) = N(s + t) — N(s), the counting process {N s (t), t ^ 0} is a nonstationary 
Poisson process with intensity function X s (t ) = X (,v + t),t > 0. Consequently, N s (t) 
is Poisson with mean 

rt nt ps-\-t 

/ X s (y)dy= I X(s + y)dv= I X(x)dx 

Jo ' Jo Js 

and the result is proven. ■ 

Remark That N(s + t ) — N(s) has a Poisson distribution with mean f ' + ' X(y ) dy is a 
consequence of the Poisson limit of the sum of independent Bernoulli random variables 
(see Example 2.47). To see why, subdivide the interval (.v, .v + f] into n subintervals 
of length h, where subinterval i goes from s + (i — 1)^ to s + i l ~, i = 1 ,,n. Let 
Nj = N(s + i'-) — N(s + (i — 1) h) be the number of events that occur in subinterval 
i , and note that 

( " \ 

P{^ 2 events in some subinterval} = P Uw ^2} 

n 

< J2 > 2 i 

i=t 

— no(t/n) by Axiom (iii) 
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Because 



o(t /n) 

lim no(t/n) = lim t -= 0 


it follows that, as n increases to oo, the probability of having two or more events in any 
of the n subintervals goes to 0. Consequently, with a probability going to 1, /V (t ) will 
equal the number of subintervals in which an event occurs. Because the probability 
of an event in subinterval i is X(s + i-)- + o (-), it follows, because the number of 
events in different subintervals are independent, that when n is large the number of 
subintervals that contain an event is approximately a Poisson random variable with 
mean 



i=i 


But, 



and the result follows. 


Time sampling an ordinary Poisson process generates a nonhomogeneous Poisson 
process. That is, let {N(t), t ^ 0} he a Poisson process with rate X, and suppose that 
an event occurring at time t is, independently of what has occurred prior to t, counted 
with probability p(t). With N c (t) denoting the number of counted events by time r, the 
counting process {N c (t),t ^ 0} is a nonhomogeneous Poisson process with intensity 
function X(t) = Xp(t). This is verified by noting that {N c (t),t ^ 0} satisfies the 
nonhomogeneous Poisson process axioms. 

1. N c ( 0) = 0. 

2. The number of counted events in ( s , .v + t) depends solely on the number of events 
of the Poisson process in (s, s +1), which is independent of what has occurred prior 
to time j. Consequently, the number of counted events in (s , s +1 ) is independent of 
the process of counted events prior to s, thus establishing the independent increment 
property. 

3. Let N c (t, t + h) = N c (t + h) — N c (t), with a similar definition of N(t, t + h). 


P{N c (t, t + h) ^ 2} < P{N(t, t + h)^ 2} = o(h ) 

4. To compute P{N c (t, t + h) — 1}, condition on N{t, t + h ). 

P{N c {t, t + h) = 1} 

= P{N c (t, / + /;)= l|JV(/\ t + h) = 1}P{1V(L t + h) = 1} 
+ P{N c {t,t + h) = l\N(t,t + h) ^ 2 }P{N(t,t + h) ^ 2} 
= P{N c (t, t + h) = l|lV(t, t + h) = 1 }Xh + o(h ) 

= p(t)Xh + o(h ) 
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The importance of the nonhomogeneous Poisson process resides in the fact that we 
no longer require the condition of stationary increments. Thus we now allow for the 
possibility that events may be more likely to occur at certain times than during other 
times. 

Example 5.24 Siegbert runs a hot dog stand that opens at 8 A.M. From 8 until 11 A.M. 
customers seem to arrive, on the average, at a steadily increasing rate that starts with an 
initial rate of 5 customers per hour at 8 A.M. and reaches a maximum of 20 customers 
per hour at 11 A.M. From 11 A.M. until 1 P.M. the (average) rate seems to remain constant 
at 20 customers per hour. However, the (average) arrival rate then drops steadily from 1 
P.M. until closing time at 5 P.M. at which time it has the value of 12 customers per hour. 
If we assume that the numbers of customers arriving at Siegbert’s stand during disjoint 
time periods are independent, then what is a good probability model for the preceding? 
What is the probability that no customers arrive between 8:30 A.M. and 9:30 A.M. on 
Monday morning? What is the expected number of arrivals in this period? 

Solution: A good model for the preceding would be to assume that arrivals consti¬ 
tute a nonhomogeneous Poisson process with intensity function X(t) given by 


m = 


5 + 5 1, 

20 , 

20 — 2(t — 5), 


0 ^ t <3 
3 s; t < 5 
5^/<9 


and 


X(t) = X(t — 9) for t > 9 


Note that N(t) represents the number of arrivals during the first t hours that the store 
is open. That is, we do not count the hours between 5 P.M. and 8 A.M. If for some 
reason we wanted N(t) to represent the number of arrivals during the first t hours 
regardless of whether the store was open or not, then, assuming that the process 
begins at midnight we would let 


m = 


5 + 5(f — 8), 

20 , 

20-2 (t- 13), 

0 , 


0 < t s; 8 
8 < t < 11 
11 < t ^ 13 
13 < t < 17 
17 < t ^ 24 


and 


X{t) = X(t — 24) for f > 24 


As the number of arrivals between 8:30 A.M. and 9:30 A.M. will be Poisson with 
mean m(\) — m(j) in the first representation (and — m{^~) in the second 

representation), we have that the probability that this number is zero is 


exp 


J^ V (5 + 5f) oh J =e 


-to 
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and the mean number of arrivals is 


3/2 


(5 + 5 t)dt = 10 


1/2 


Suppose that events occur according to a Poisson process with rate X, and suppose 
that, independent of what has previously occurred, an event at time s is a type 1 
event with probability P\ (,v) or a type 2 event with probability /M.v) = 1 — Pj (,v). If 
Ni (t), t + 0, denotes the number of type i events by time t, then it easily follows from 
Definition5.3 that {N\(t), t ^ 0} and{/V 2 (f), t + 0} are independent nonhomogeneous 
Poisson processes with respective intensity functions A.,-(t) = XPj(t), i = 1, 2. (The 
proof mimics that of Proposition 5.2.) This result gives us another way of understanding 
(or of proving) the time sampling Poisson process result of Proposition 5.3, which 
states that /V| (t) and N 2 O) are independent Poisson random variables with means 
E[Ni(t)] = Xf^ Pi(s)ds,i = 1,2. 

Example 5.25 (The Output Process of an Infinite Server Poisson Queue) It turns 
out that the output process of the M/G/oo queue—that is, of the infinite server queue 
having Poisson arrivals and general service distribution G—is a nonhomogeneous Pois¬ 
son process having intensity function kit) = XG(t). To verify this claim, let us first 
argue that the departure process has independent increments. Towards this end, consider 
nonoverlapping intervals 0\ ,..., Ok - now say that an arrival is type i, i = 1,..., k, if 
that arrival departs in the interval O,-. By Proposition 5.3, it follows that the numbers 
of departures in these intervals are independent, thus establishing independent incre¬ 
ments. Now, suppose that an arrival is “counted” if that arrival departs between t and 
t + h. Because an arrival at time s, s < t + h, will be counted with probability P(s), 
where 


pi G(t + h - s) - G(t - s), 

W |G(f + /!-j), 


if s < t 

if t < s < t + h 


it follows from Proposition 5.3 that the number of departures in (f, t + h) is a Poisson 
random variable with mean 

pt-\-h pt-\-h pt 

X I P(s)ds — X / G(t + h — s)ds — X I G(t — s)ds 


o 



0 


— X f G(y)dy - X / 


G(y)dy 



— XG{t)h + o(h) 


Therefore, 


P{ 1 departure in ( t , t + h )} = XG(t)h e 


-W(t)h 


+ o(h) = XG(t)h + o(h ) 


and 


P{^ 2 departures in ( t , t + h)} = o(h) 
which completes the verification. 
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If we let S n denote the time of the nth event of the nonhomogeneous Poisson process, 
then we can obtain its density as follows: 

P{t < S n < t + h] — P{N(t) = n — 1, one event in (r, t + h)} + o(h ) 

— P{N(t) — n — l}P{one event in ( t , t + li)} + o(h) 


— g-m(t) J 


(n ~ 1)! 


■[k(t)h + o(h)] + o(h) 


^A(t) e ~ ffl(f) [ ^ (r)] h + o(h) 
(n - 1)! 


which implies that 




(n - 1)! 


where 


m(t) = / X(s) ds 

Jo 

5.4.2 Compound Poisson Process 

A stochastic process \X(t), t ^ 0} is said to be a compound Poisson process if it can 
be represented as 

N(t ) 

X(t) = ( 5 - 23 ) 

1=1 

where {N(t), t ^ 0} is a Poisson process, and { Y, , i X 1} is a family of independent 
and identically distributed random variables that is also independent of {IV(f), t > 0}. 
As noted in Chapter 3, the random variable X (t) is said to be a compound Poisson 
random variable. 


Examples of Compound Poisson Processes 

(i) If Y, = 1, then X(t) = N(t), and so we have the usual Poisson process. 

(ii) Suppose that buses arrive at a sporting event in accordance with a Poisson process, 
and suppose that the numbers of fans in each bus are assumed to be independent 
and identically distributed. Then {X{t),t ^ 0} is a compound Poisson process 
where X(t) denotes the number of fans who have arrived by t. In Equation (5.23) 
Yj represents the number of fans in the ;th bus. 

(iii) Suppose customers leave a supermarket in accordance with a Poisson process. If 

the Yj, the amount spent by the ;th customer, i = 1, 2,..., are independent and 
identically distributed, then {X(t), t ^ 0} is a compound Poisson process when 
X(t) denotes the total amount of money spent by time t. ■ 
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Because X(t ) is a compound Poisson random variable with Poisson parameter Xt, 
we have from Examples 3.10 and 3.17 that 


£[Z(r)] = XtE[Y\] (5.24) 

and 

Var(A(r)) = \tE[Yl] (5.25) 

Example 5.26 Suppose that families migrate to an area at a Poisson rate X — 2 per 
week. If the number of people in each family is independent and takes on the values 
1, 2, 3, 4 with respective probabilities g, j, g, then what is the expected value and 
variance of the number of individuals migrating to this area during a fixed five-week 
period? 

Solution: Letting 7; denote the number of people in the i th family, we have 


E[Yt]= l-i+2.I + 3.i + 4.I = 5, 

E[Y?] = l 2 • I + 2 2 • i + 3 2 • i + 4 2 • l = 


4 3 
6 


Hence, letting 7(5) denote the number of immigrants during a five-week period, we 
obtain from Equations (5.24) and (5.25) that 

£[X(5)] = 2 • 5 • | = 25 


and 


Var[X(5)] = 2 • 5 ■ f = ^ ■ 

Example 5.27 (Busy Periods in Single-Server Poisson Arrival Queues) Consider 
a single-server service station in which customers arrive according to a Poisson process 
having rate X. An arriving customer is immediately served if the server is free; if not, 
the customer waits in line (that is, he or she joins the queue). The successive service 
times are independent with a common distribution. 

Such a system will alternate between idle periods when there are no customers in 
the system, so the server is idle, and busy periods when there are customers in the 
system, so the server is busy. A busy period will begin when an arrival finds the system 
empty, and because of the memoryless property of the Poisson arrivals it follows that 
the distribution of the length of a busy period will be the same for each such period. 
Let B denote the length of a busy period. We will compute its mean and variance. 

To begin, let S denote the service time of the first customer in the busy period and 
let N (S) denote the number of arrivals during that time. Now, if N (S) = 0 then the 
busy period will end when the initial customer completes his service, and so B will 
equal S in this case. Now, suppose that one customer arrives during the service time 
of the initial customer. Then, at time S there will be a single customer in the system 
who is just about to enter service. Because the arrival stream from time S on will still 
be a Poisson process with rate X, it thus follows that the additional time from S until 
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the system becomes empty will have the same distribution as a busy period. That is, if 
N(S) = 1 then 

B = 5+ Bi 

where B\ is independent of S and has the same distribution as B. 

Now, consider the general case where N(S ) = n, so there will be n customers 
waiting when the server finishes his initial service. To determine the distribution of 
remaining time in the busy period note that the order in which customers are served 
will not affect the remaining time. Hence, let us suppose that the n arrivals, call them 
Ci, ..., C n , during the initial service period are served as follows. Customer Ci is 
served first, but C 2 is not served until the only customers in the system are C 2 ,..., C n . 
For instance, any customers arriving during Ci’s service time will be served before C 2 . 
Similarly, C 3 is not served until the system is free of all customers but C 3 ,..., C„ , 
and so on. A little thought reveals that the times between the beginnings of service of 
customers C, and C;+i, i = 11, and the time from the beginning of service 
of C n until there are no customers in the system, are independent random variables, 
each distributed as a busy period. 

It follows from the preceding that if we let Bi , Bi, ... be a sequence of independent 
random variables, each distributed as a busy period, then we can express B as 


N(S) 


b = s+J 2 b ‘ 


1 = 1 


Hence, 


N(S) 


E[B\S] = S + E 


1=1 


and 



However, given S , B, is a compound Poisson random variable, and thus from 

Equations (5.24) and (5.25) we obtain 


E[B\S] = S + XSE[B ] = (1 + A£[fl])S 
Var(B\S ) = XSE[B 2 ] 


Hence, 


E[B] = £[£[5|5]] = (1 + A£[B])£[S] 
implying, provided that A£[,S] < l,that 
E[S] 


1 - XE[S] 


E[B} = 
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Also, by the conditional variance formula 

Var(B) = Var(£[5|5]) + £[Var(fl|5)] 

= (1 + X£[B]) 2 Var(5) + XE[S]E[B 2 ] 

= (1 + X£[B]) 2 Var(5) + XE[S](Var(B) + E 2 [B]) 


yielding 
Var(B) 
Using E[B] 
Var(fi) 


Var(5)(l + XE[B]) 2 + XE[S]E 2 [B] 
1 -XE[S] 

E[S]/(\ — A£[5']), we obtain 

Var (S) + XE 3 [S] 

(l-XE[S]) 2 


There is a very nice representation of the compound Poisson process when the set 
of possible values of the Y, is finite or countably infinite. So let us suppose that there 
are numbers aj, j ^ 1, such that 

P{Yi =aj} = Pj , J2Pj = 1 
j 


Now, a compound Poisson process arises when events occur according to a Poisson 
process and each event results in a random amount Y being added to the cumulative 
sum. Let us say that the event is a type j event whenever it results in adding the 
amount aj, j Xt 1 . That is, the ith event of the Poisson process is a type j event if 
Yi = a r If we let Nj (t) denote the number of type j events by time t, then it follows 
from Proposition 5.2 that the random variables Nj(t), j Js 1. are independent Poisson 
random variables with respective means 


E[Nj{t)} = Xpjt 

Since, for each j , the amount a j is added to the cumulative sum a total of IV/(f) times 
by time t, it follows that the cumulative sum at time t can be expressed as 

X(t) = J2 a .i N j( t ') (5-26) 

j 

As a check of Equation (5.26), let us use it to compute the mean and variance of X(t). 
This yields 


E[X(t)] = 


E 


J2 a J N j (t) 


= ^2ajE[Nj(t)] 
j 

= 

j 

= Xt E[Y\] 
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Also, 


Var[X(?)] = Var 


Y^ a i N i {t) 

j 

= Var[y W ] 

j 

= 


= A?£[K, ] 


by the independence of the Nj(t), j A 1 


where the next to last equality follows since the variance of the Poisson random variable 
Nj (?) is equal to its mean. 

Thus, we see that the representation (5.26) results in the same expressions for the 
mean and variance of X(t) as were previously derived. 

One of the uses of the representation (5.26) is that it enables us to conclude that as t 
grows large, the distribution of X ( t ) converges to the normal distribution. To see why, 
note first that it follows by the central limit theorem that the distribution of a Poisson 
random variable converges to a normal distribution as its mean increases. (Why is this?) 
Therefore, each of the random variables Nj ( t ) converges to a normal random variable 
as t increases. Because they are independent, and because the sum of independent 
normal random variables is also normal, it follows that X ( t ) also approaches a normal 
distribution as t increases. 

Example 5.28 In Example 5.26, find the approximate probability that at least 240 
people migrate to the area within the next 50 weeks. 

Solution: Since A = 2, £[T;] = 5/2, E[Yf] — 43/6, we see that 
£[X(50)] = 250, Var[X(50)] = 4300/6 


Now, the desired probability is 

P{X( 50) A 240} = P{X(50) > 239.5} 

A - (50) - 250 ^ 239.5 - 2501 
~~ a/4300/S - ^ V4300/6 | 

= 1 — 0( — 0.3922) 

= 0(0.3922) 

= 0.6525 


where Table 2.3 was used to determine 0(0.3922), the probability that a standard 
normal is less than 0.3922. ■ 

Another useful result is that if \X{t), t 0} and \ Y(r). t ^ 0} are independent com¬ 
pound Poisson processes with respective Poisson parameters and distributions Ai, F\ 
and Ao, Fj, then }X(f) + Y (?), f ^ 0} is also a compound Poisson process. This is true 
because in this combined process events will occur according to a Poisson process with 
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rate X | + X 2 , and each event independently will be from the first compound Poisson 
process with probability X\/(X\ + X 2 ). Consequently, the combined process will be 
a compound Poisson process with Poisson parameter X\ + X 2 , and with distribution 
function F given by 

A.j X 2 

F(x) = — -J— Fi(x) + —^—F 2 (x) 

X\ + X. 2 At + A-2 


5.4.3 Conditional or Mixed Poisson Processes 


Let ^ 0} be a counting process whose probabilities are defined as follows. 

There is a positive random variable L such that, conditional on L — X, the counting 
process is a Poisson process with rate X. Such a counting process is called a conditional 
or a mixed Poisson process. 

Suppose that L is continuous with density function g. Because 


P{N(t + s)-N(s) =n] = 



P{N(t + s) — N(s) 


e 


-xt 


(xty_ 

n\ 


g(X)dX 


n | L = A}g(A) dX 

(5.27) 


we see that a conditional Poisson process has stationary increments. However, because 
knowing how many events occur in an interval gives information about the possible 
value of L, which affects the distribution of the number of events in any other inter¬ 
val, it follows that a conditional Poisson process does not generally have independent 
increments. Consequently, a conditional Poisson process is not generally a Poisson 
process. 

Example 5.29 If g is the gamma density with parameters m and 0, 


g(X) = 0e 


-ox iOX) 


m— 1 


(m — 1)! 


A > 0 


then 


u (Xt) n fn(6X) m - x 

P{ N (0 = n} = / e - x,K —^de- ex ) ' 

Jo n\ (m - 1 )! 

fPinm poo 


dX 


n\(m — 1 ) 


! Jo 


e -(t+e)X x n+m-\ d x 


Multiplying and dividing by ( ( "^" i ) „ + 1 „ ) , ! gives 


P{N(t) = n} = 


t n 0 m (n + m - 1)! 


L 


^n+m—l 
(n + m — 1)! 


00 , f4 m, ((t + 0)X) n 

(t + 0)e~ (,+e)x - — - ... dX 


n\(m — 1)! (f + 0)"+ m 

Because ( t + 6)e~ l '' +s ^ x ((t + d)X) n+m ~ l /(n + m — 1)! is the density function of a 
gamma (n + m, t + 0) random variable, its integral is 1, giving the result 


P{N(t) =n} = 


■ m — 1 
n 


0 


t + l 


t -\~ 0 
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Therefore, the number of events in an interval of length t has the same distribution of 
the number of failures that occur before a total of m successes are amassed, when each 
trial is a success with probability . ■ 

To compute the mean and variance of N(t), condition on L. Because, conditional 
on L, N(t) is Poisson with mean Lt, we obtain 

E[N(t)\L] = Lt 
Var(Af(f)|L) = Lt 

where the final equality used that the variance of a Poisson random variable is equal to 
its mean. Consequently, the conditional variance formula yields 

Var (N(t)) = E[Lt] + Var (Lt) 

= tE[L] + r 2 Var(L) 


We can compute the conditional distribution function of L, given that N(t) = n, as 
follows. 


P{L ^ x\N(t) = n] = 


P{L < x, N(t) = n} 
P{N(t) = n} 

/ 0 °° P{L ^ x, N(t ) = n\L 


P{N(t) = n] 


L}g(X)dX 


fo P{N(t) = n\L = \}g(\)d\ 
P{N(t) = n} 
fo e~ kl (Xt) n g(X) dX 
f™e-^(Xt)"g(X)dX 


where the final equality used Equation (5.27). In other words, the conditional density 
function of L given that /V (t) = n is 


fL\N(t)(L | n) = 


e~ x, X n g(X) 
f 0 °°e-^X"g(X) dX’ 


X^O 


(5.28) 


Example 5.30 An insurance company feels that each of its policyholders has a rating 
value and that a policyholder having rating value X will make claims at times distributed 
according to a Poisson process with rate X, when time is measured in years. The firm also 
believes that rating values vary from policyholder to policyholder, with the probability 
distribution of the value of a new policyholder being uniformly distributed over (0, 1). 
Given that a policyholder has made n claims in his or her first t years, what is the 
conditional distribution of the time until the policyholder’s next claim? 

Solution: If T is the time until the next claim, then we want to compute P{T > x | 

N(t) = «}. Conditioning on the policyholder’s rating value gives, upon using 
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Equation (5.28), 

p OO 

P{T > x | N(t) =n}= / P{T >x \ L = X, N(t) = n } 

Jo 

x/i|At(r)(^ I n)dX 
_ e~ Xx e~ Xt X n dX 
/q 1 e~ kt X n dX 

There is a nice formula for the probability that more than n events occur in an interval 
of length t. In deriving it we will use the identity 


E « 

j=n+1 




J ! 


f 


= / le 


-lx (^ X )" 


dx 


(5.29) 


which follows by noting that it equates the probability that the number of events by 
time t of a Poisson process with rate X is greater than n with the probability that the 
time of the (n + l)st event of this process (which has a gamma (n + 1, X) distribution) 
is less than t. Interchanging X and t in Equation (5.29) yields the equivalent identity 


E « 

j=n+1 


-1,(W 


J ! 


f 


f -n ( tx) n 
te - dx 


(5.30) 


Using Equation (5.27) we now have 


P{iV(f)>«}= J f°° e- Xt ^-g{X)dX 

A i y ° i! 

= r i 

(by interchanging) 

f°° (tx) n 

= / 1 te ,x - dxg(X)dX 

Jo Jo n! 

(using (5.30)) 

f°° (tx)" 

= 1 / g(X)dXte tx - dx 

Jo Jx n\ 

C°° (tx) n 

= / G(x)te~ ,x( ' dx 

Jo n\ 

(by interchanging) 


5.5 Random Intensity Functions and Hawkes Processes 

Whereas the intensity function X(t) of a nonhomogeneous Poisson process is a deter¬ 
ministic function, there are counting processes {N(t), t Js 0} whose intensity function 
value at time t, call it R(t), is a random variable whose value depends on the history 
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of the process up to time t. That is, if we let H, denote the “history” of the process 
up to time t then R(t), the intensity rate at time f, is a random variable whose value is 
determined by Hi and which is such that 


P(N(t + h ) - N(t) = 11 H,) = R(t)h + o(h) 


and 


P(N(t + h) - N(t ) ^ 2\H,) = o(h) 

The Hawkes process is an example of a counting process having a random intensity 
function. This counting process assumes that there is a base intensity value X > 0, 
and that associated with each event is a nonnegative random variable, called a mark, 
whose value is independent of all that has previously occurred and has distribution F. 
Whenever an event occurs, it is supposed that the current value of the random intensity 
function increases by the amount of the event’s mark, with this increase decreasing 
over time at an exponential rate a. More specifically, if there have been a total of N(t) 
events by time t, with ,S'i < S 2 < ■ ■ ■ < *%(;) being the event times and M-, being the 
mark of event N(t), then 


N{t) 


R(t ) = k+ J2 M ie ~ a(t ~ Si) 


In other words, a Hawkes process is a counting process in which 

1. R(0)=X; 

2. whenever an event occurs, the random intensity increases by the value of the event’s 
mark; 

3. if there are no events between s and s + t then R(s + t) — X + (R(s) — X)e~°". 

Because the intensity increases each time an event occurs, the Hawkes process is 
said to be a self-exciting process. 

We will derive E[N(t )], the expected number of events of a Hawkes process that 
occur by time t. To do so, we will need the following lemma, which is valid for all 
counting processes. 


Lemma Let R(t),t f 0 be the random intensity function of the counting process 
{N(t), t ^ 0} having iV(0) = 0. Then, with/7i(r) = Zs[lV(t)] 



Proof. 


E[N(t + h)\N(t), R(t)] = N(t ) + R(t)h + o(h) 


Taking expectations gives 


E[N(t + h)] = E[N(t)] + E[R(t)]h + o(h ) 
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That is, 


m(t + h) — m{t) + hE[R(t)] + o(h) 


or 


m(t + h) — 777(f) 



h 

Letting h go to 0 gives 
/ n'(t) = E[R(t )] 


Integrating both sides from 0 to t now gives the result: 


777(f) = f E[R(s)]ds 


Using the preceding, we can now prove the following proposition. 

Proposition 5.5 If /x is the expected value of a mark in a Hawkes process, then for 
this process 


ElN(t)] = Xt + ^ - 1 - (/x - a)t) 

(/x - a) 1 


Proof. To determine the mean value function m (f) it suffices, by the preceding lemma, 
to determine £[7?(f)], which will be accomplished by deriving and then solving a 
differential equation. To begin note that, with M t (h ) equal to the sum of the marks of 
all events occurring between t and t + h, 

R(t + h)=X+ ( R(t) - X)e~ ah + M t (h ) + o{h) 

Letting g(t) = E[R(t)] and taking expectations of the preceding gives 

g(t + h) = >,. + ( g(t ) - X)e~ ah + E[M,m + o(h) 

Using the identity e~ ah — 1 — ah + o(h) shows that 

g(t + h) — X + (g(t) - A.) (1 - ah) + E[M,(h )] + o(h) 


— g(t) — ahg(t) + Xah + E[M,(h)] + o(h ) 


(5.31) 


Now, given R(t), there will be 1 event between t and t+h with probability R(t)h+o(h), 
and there will be 2 or more with probability o(h). Hence, conditioning on the number 
of events between t and t + h yields, upon using that /x is the expected value of a mark, 
that 


E[M t (h)\R(t)] = tiR{t)h + o{h) 


Taking expectations of both sides of the preceding gives that 
E[M,(h)] = txg{t)h + o(h) 
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Substituting back into Equation (5.31) gives 

git + h) = git) - ahg(t) + Xah + figit)h + o(h) 
or, equivalently, 

g(t + h)-g(t) o(h) 

--- = ifi- a) git) + Xa + —— 

h h 

Letting h go to 0 gives that 

g'(t) = (ji - a) git) + Xa 

Letting fit) = (/x — a)g(t) + La,the preceding can be written as 
fit) 


= m 


or 


fi — a 


nt) 

- — ii — a 

fd) 

Integration now yields 

log (fit)) = in-a)t + C 

Now, g(0) = £[i?(0)] = X and so /(0) = fiX, showing that C — log ifiX) and giving 
the result 

fit) = tiXe^~ a)t 

Using that g(t) = ^ ^ ~ ^ gives 


git) = ^ + 


^ , ( [ 1 - 0)1 


V 


/x — a 
Hence, from Lemma 5.1 


1 ) 


r t-i 1 
Jo M-c 


E[Nit)] =Xt+ I _ i) ds 


— Xt ■ 


X/i 


ill - a ) 2 


ie^~ a)t _ 1 _ (/x _ a)t ) 


and the result is proved. 
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Exercises 


1. The time T required to repair a machine is an exponentially distributed random 
variable with mean j (hours). 

(a) What is the probability that a repair time exceeds ^ hour? 

(b) What is the probability that a repair takes at least 12 j hours given that its 
duration exceeds 12 hours? 

2. Suppose that you arrive at a single-teller bank to find five other customers in the 
bank, one being served and the other four waiting in line. You join the end of 
the line. If the service times are all exponential with rate /x, what is the expected 
amount of time you will spend in the bank? 

3. Let A be an exponential random variable. Without any computations, tell which 
one of the following is correct. Explain your answer. 

(a) E[X 2 \X > 1] = £[(A + l) 2 ] 

(b) E[X 2 \X > 1] = E[X 2 ] + 1 

(c) E[X 2 \X > 1] = (1 + E[X]) 2 

4. Consider a post office with two clerks. Three people, A, B, and C, enter simulta¬ 
neously. A and B go directly to the clerks, and C waits until either A or B leaves 
before he begins service. What is the probability that A is still in the post office 
after the other two have left when 

(a) the service time for each clerk is exactly (nonrandom) ten minutes? 

(b) the service times are ; with probability i = 1, 2, 3? 

(c) the service times are exponential with mean l//x? 

5. If X is exponential with rate X, show that Y = [ X] +1 is geometric with parameter 
p = 1 — e~ x , where [;t] is the largest integer less than or equal to x. 

6 . In Example 5.3 if server i serves at an exponential rate /.;, i = 1, 2, show that 



BjSmith is not last} = 


*7. If Xi and A 2 are independent nonnegative continuous random variables, show 
that 


nit) 


P{Xi < A 2 |min (X u X 2 ) = t} = 


nit ) + r 2 (t ) 


where r, (t ) is the failure rate function of A/. 

8 . If A and Y are independent exponential random variables with respective rates /, 
and /x, what is the conditional distribution of A given that A < T? 

9. Machine 1 is currently working. Machine 2 will be put in use at a time t from 
now. If the lifetime of machine i is exponential with rate a; , i = 1, 2, what is the 
probability that machine 1 is the first machine to fail? 
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*10. Let X and Y be independent exponential random variables with respective rates 
X and /x. Let M — min(X, Y). Find 

(a) E[MX\M = X], 

(b) E[MX\M = Y], 

(c) Cov(X, M). 

11. Let X, Y\,..., Y n be independent exponential random variables; X having rate 
X , and Yj having rate /x. Let A l be the event that the / 1 h smallest of these n + 1 
random variables is one of the F,. Find p = P{ X > max, fj1- by using the 
identity 


P = P(A i • • ■ A n ) = P{A l )P{A 1 \A l ) ■ ■ ■ P(A„\Ai • • ■ A„_i) 

Verify your answer when n = 2 by conditioning on X to obtain p. 

12. If Xj,i = 1, 2, 3, are independent exponential random variables with rates /.,■, 
i = 1, 2, 3, find 

(a) P{X x <X 2 < X 3 }, 

(b) P{Xi < X 2 \ max(Xi, X 2 , X 3 ) = X 3 ], 

(c) E[maxXj\X\ < X 2 < X 3 ], 

(d) E [max Xj ]. 

13. Find, in Example 5.10, the expected time until the nth person on line leaves the 
line (either by entering service or departing without service). 

14. I am waiting for two friends to arrive at my house. The time until A arrives is 
exponentially distributed with rate X a , and the time until B arrives is exponen¬ 
tially distributed with rate Xb. Once they arrive, both will spend exponentially 
distributed times, with respective rates p. a and /x/, at my home before departing. 
The four exponential random variables are independent. 

(a) What is the probability that A arrives before and departs after B? 

(b) What is the expected time of the last departure? 

15. One hundred items are simultaneously put on a life test. Suppose the lifetimes 
of the individual items are independent exponential random variables with mean 
200 hours. The test will end when there have been a total of 5 failures. If T is the 
time at which the test ends, find E[T] and Var(r). 

16. There are three jobs that need to be processed, with the processing time of job i 
being exponential with rate /x,. There are two processors available, so processing 
on two of the jobs can immediately start, with processing on the final job to start 
when one of the initial ones is finished. 

(a) Let Tj denote the time at which the processing of job i is completed. If the 
objective is to minimize E[Ti + T 2 + f 3 ], which jobs should be initially 
processed if /x 1 < p. 2 < p, 3 l 

(b) Let M, called the makespan, be the time until all three jobs have been pro¬ 
cessed. With S equal to the time that there is only a single processor working. 
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show that 


3 

2 E[M] = E[S] + J2 '/M, 

i=\ 

For the rest of this problem, suppose that = /r 2 = /r, 113 = a. Also, let 
Pf/i) be the probability that the last job to finish is either job 1 or job 2, and 
let P(k) = 1 — Pf/i) be the probability that the last job to finish is job 3. 

(c) Express £[S] in terms of P(/x) and P(X). 

Let Pi.j(n ) be the value of Pi/i ) when i and j are the jobs that are initially 
started. 

(d) Show that Fi, 2 (ft) < ^ 1,3 (aO- 

(e) If /x > X show that E[M] is minimized when job 3 is one of the jobs that is 
initially started. 

(f) If /x < A. show that E[M] is minimized when processing is initially started 
on jobs 1 and 2. 

17. A set of n cities is to be connected via communication links. The cost to construct 
a link between cities i and j is Cj-y, i ^ j. Enough links should be constructed 
so that for each pair of cities there is a path of links that connects them. As a 
result, only n — 1 links need be constructed. A minimal cost algorithm for solving 
this problem (known as the minimal spanning tree problem) first constructs the 

cheapest of all the (”) links. Then, at each additional stage it chooses the cheapest 
link that connects a city without any links to one with links. That is, if the first 
link is between cities 1 and 2, then the second link will either be between 1 and 
one of the links 3,..., n or between 2 and one of the links 3,....«. Suppose that 

all of the (") costs C, j are independent exponential random variables with mean 
1. Find the expected cost of the preceding algorithm if 

(a) n = 3, 

(b) n = 4. 

*18. Let X\ and Xo be independent exponential random variables, each having rate ji. 
Let 

X(i) = minimum! X\, X 2 ) and X ( 2 ) = maximum^ 1 , X 2 ) 

Find 

(a) E[X (l) ], 

(b) Var[X (I) ], 

(c) E[X (2) ], 

(d) Var[X (2) ], 

19. In a mile race between A and B, the time it takes A to complete the mile is an 
exponential random variable with rate k a and is independent of the time it takes 
B to complete the mile, which is an exponential random variable with rate /./,. 



The Exponential Distribution and the Poisson Process 


341 


The one who finishes earliest is declared the winner and receives Re~ al if the 
winning time is t, where R and a are constants. If the loser receives 0, find the 
expected amount that runner A wins. 

20. Consider a two-server system in which a customer is served first by server 1, then 
by server 2, and then departs. The service times at server i are exponential random 
variables with rates //,■, i = 1,2. When you arrive, you find server 1 free and two 
customers at server 2—customer A in service and customer B waiting in line. 

(a) Find P/\ , the probability that A is still in service when you move over to 
server 2. 

(b) Find Pb, the probability that B is still in the system when you move over to 
server 2. 

(c) Find E[T\, where T is the time that you spend in the system. 

Hint: Write 

T = Si + S 2 + + W B 

where 5/ is your service time at server i, Wa is the amount of time you wait in 
queue while A is being served, and W B is the amount of time you wait in queue 
while B is being served. 

21. In a certain system, a customer must first be served by server 1 and then by server 2. 
The service times at server i are exponential with rate m , i = 1, 2. An arrival 
finding server 1 busy waits in line for that server. Upon completion of service at 
server 1, a customer either enters service with server 2 if that server is free or else 
remains with server 1 (blocking any other customer from entering service) until 
server 2 is free. Customers depart the system after being served by server 2. Sup¬ 
pose that when you arrive there is one customer in the system and that customer is 
being served by server 1. What is the expected total time you spend in the system? 

22. Suppose in Exercise 21 you arrive to find two others in the system, one being 
served by server 1 and one by server 2. What is the expected time you spend in 
the system? Recall that if server 1 finishes before server 2, then server 1 ’s customer 
will remain with him (thus blocking your entrance) until server 2 becomes free. 

*23. A flashlight needs two batteries to be operational. Consider such a flashlight along 
with a set of n functional batteries—battery 1, battery 2,, battery n. Initially, 
battery 1 and 2 are installed. Whenever a battery fails, it is immediately replaced 
by the lowest numbered functional battery that has not yet been put in use. Suppose 
that the lifetimes of the different batteries are independent exponential random 
variables each having rate fi. At a random time, call it T, a battery will fail and 
our stockpile will be empty. At that moment exactly one of the batteries—which 
we call battery X —will not yet have failed. 

(a) What is P{X = «}? 

(b) What is P{X = 1}? 

(c) What is P{X = i}? 

(d) Find E[T], 

(e) What is the distribution of T ? 
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24. There are two servers available to process n jobs. Initially, each server begins 
work on a job. Whenever a server completes work on a job, that job leaves the 
system and the server begins processing a new job (provided there are still jobs 
waiting to be processed). Let T denote the time until all jobs have been processed. 
If the time that it takes server i to process a job is exponentially distributed with 
rate /x/, i = 1, 2, find E[T] and Var(T). 

25. Customers can be served by any of three servers, where the service times of 
server; are exponentially distributed with rate m, i = 1, 2, 3. Whenever a server 
becomes free, the customer who has been waiting the longest begins service with 
that server. 

(a) If you arrive to find all three servers busy and no one waiting, find the expected 
time until you depart the system. 

(b) If you arrive to find all three servers busy and one person waiting, find the 
expected time until you depart the system. 

26. Each entering customer must be served first by server 1, then by server 2, and 
finally by server 3. The amount of time it takes to be served by server i is an 
exponential random variable with rate /r,-, i = 1,2, 3. Suppose you enter the 
system when it contains a single customer who is being served by server 3. 

(a) Find the probability that server 3 will still be busy when you move over to 
server 2 . 

(b) Find the probability that server 3 will still be busy when you move over to 
server 3. 

(c) Find the expected amount of time that you spend in the system. (Whenever 
you encounter a busy server, you must wait for the service in progress to end 
before you can enter service.) 

(d) Suppose that you enter the system when it contains a single customer who is 
being served by server 2. Find the expected amount of time that you spend in 
the system. 

27. Show, in Example 5.7, that the distributions of the total cost are the same for the 
two algorithms. 

28. Consider n components with independent lifetimes, which are such that compo¬ 
nent i functions for an exponential time with rate X,, Suppose that all components 
are initially in use and remain so until they fail. 

(a) Find the probability that component 1 is the second component to fail. 

(b) Find the expected time of the second failure. 

Hint: Do not make use of part (a). 

29. Let X and Y be independent exponential random variables with respective rates 
X and ji, where X > /i. Let c > 0. 

(a) Show that the conditional density function of X, given that X + Y = c, is 


fx\x+y{x\c) = 


(X - !Ji)e- (X -^ x 

1 _ g -(A-/i)c 


0 < x < c 
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(b) Use part (a) to find E[X | A + Y = c], 

(c) Find E[Y\X + Y = c], 

30. The lifetimes of A’s dog and cat are independent exponential random variables 
with respective rates X ( j and X c . One of them has just died. Find the expected 
additional lifetime of the other pet. 

31. A doctor has scheduled two appointments, one at 1 P.M. and the other at 1:30 P.M. 
The amounts of time that appointments last are independent exponential random 
variables with mean 30 minutes. Assuming that both patients are on time, find the 
expected amount of time that the 1:30 appointment spends at the doctor’s office. 

32. Let X be a uniform random variable on (0, 1), and consider a counting process 
where events occur at times X + i, for i = 0, 1,2, .... 

(a) Does this counting process have independent increments? 

(b) Does this counting process have stationary increments? 

33. Let X and Y be independent exponential random variables with respective rates 
X and fi. 

(a) Argue that, conditional on X > Y, the random variables min( A. Y ) and X — Y 
are independent. 

(b) Use part (a) to conclude that for any positive constant c 

£[min(A, Y)\X > Y + c] = £[min(A, Y)\X > Y] 

= E[ min(A, T)] = —— 

X + jx 

(c) Give a verbal explanation of why min(A, Y) and X — Y are (unconditionally) 
independent. 

34. Two individuals, A and B, both require kidney transplants. If she does not receive 
a new kidney, then A will die after an exponential time with rate /j.a , and B after an 
exponential time with rate // /;. New kidneys arrive in accordance with a Poisson 
process having rate X. It has been decided that the first kidney will go to A (or to 
B if B is alive and A is not at that time) and the next one to B (if still living). 

(a) What is the probability that A obtains a new kidney? 

(b) What is the probability that B obtains a new kidney? 

(c) What is the probability that neither A nor B obtains a new kidney? 

(d) What is the probability that both A and B obtain new kidneys? 

35. If {N(t),t ^ 0} is a Poisson process with rate X, verify that {N s (t),t > 0} 
satisfies the axioms for being a Poisson process with rate X, where N s (t ) = 
N(s + t) - N(s). 

*36. Let S(t) denote the price of a security at time t. A popular model for the process 
{S(t),t > 0} supposes that the price remains unchanged until a “shock” occurs, 
at which time the price is multiplied by a random factor. If we let N(t ) denote the 
number of shocks by time t, and let A, denote the ;th multiplicative factor, then 
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this model supposes that 

N(t) 

S(t ) = 5(0) Yl Xi 

i =1 

where X; is equal to 1 when /V(?) = 0. Suppose that the X, are independent 
exponential random variables with rate //; that {N (t), t ^ 0} is a Poisson process 
with rate A; that {N(t), t ^ 0} is independent of the X,-; and that 5(0) = .v. 

(a) Find E[S(t)]. 

(b) Find Zs[5 2 (t)]. 

37. A machine works for an exponentially distributed time with rate /i and then fails. 
A repair crew checks the machine at times distributed according to a Poisson 
process with rate A; if the machine is found to have failed then it is immediately 
replaced. Find the expected time between replacements of machines. 

38. Let {M, (t), t ^ 0}, i = 1, 2, 3 be independent Poisson processes with respective 
rates A;,= 1, 2, and set 

/V, {t) = M\ {t) + M 2 (t), N 2 (t) = M 2 (t) + M 3 (t) 

The stochastic process {((Vi(r), N 2 (t)), f ^ 0} is called a bivariate Poisson pro¬ 
cess. 

(a) Find P{N\(t ) = n, N 2 (t) = m}. 

(b) FindCov((Vi(0,(V 2 (0)- 

39. A certain scientific theory supposes that mistakes in cell division occur according 
to a Poisson process with rate 2.5 per year, and that an individual dies when 196 
such mistakes have occurred. Assuming this theory, find 

(a) the mean lifetime of an individual, 

(b) the variance of the lifetime of an individual. 

Also approximate 

(c) the probability that an individual dies before age 67.2, 

(d) the probability that an individual reaches age 90, 

(e) the probability that an individual reaches age 100. 

*40. Show that if {Nj(t), t 0} are independent Poisson processes with rate A, , 
i = 1,2, then {/V(r), t 0} is a Poisson process with rate Ai + ), 2 where 
N(t) = Ni(t) + N 2 (t). 

41. In Exercise 40 what is the probability that the first event of the combined process 
is from the N\ process? 

42. Let {N(t), t ^ 0} be a Poisson process with rate A. Let 5„ denote the time of the 
nth event. Find 


(a) £[5 4 ], 



The Exponential Distribution and the Poisson Process 


345 


(b) £[S 4 |JV( 1 ) = 2 ], 

(c) E[N (4) — N (2) |iV (1) = 3]. 

43. Customers arrive at a two-server service station according to a Poisson process 
with rate X. Whenever a new customer arrives, any customer that is in the sys¬ 
tem immediately departs. A new arrival enters service first with server 1 and then 
with server 2. If the service times at the servers are independent exponentials with 
respective rates fi i and fi 2 , what proportion of entering customers completes their 
service with server 2? 

44. Cars pass a certain street location according to a Poisson process with rate X. A 
woman who wants to cross the street at that location waits until she can see that 
no cars will come by in the next T time units. 

(a) Find the probability that her waiting time is 0. 

(b) Find her expected waiting time. 

Hint: Condition on the time of the first car. 

45. Let {N(t), t ^ 0} be a Poisson process with rate X that is independent of the 
nonnegative random variable T with mean /i and variance a 2 . Find 

(a) Cov(7\ N(T)), 

(b) Var (N(T)). 

46. Let {N(t), t 0[ be a Poisson process with rate X that is independent of the 
sequence X \, Xi ,... of independent and identically distributed random vari¬ 
ables with mean /i and variance a 2 . Find 

( N(t) \ 

N(t), J2 X ' I 

47. Consider a two-server parallel queuing system where customers arrive according 
to a Poisson process with rate X, and where the service times are exponential with 
rate //. Moreover, suppose that arrivals finding both servers busy immediately 
depart without receiving any service (such a customer is said to be lost), whereas 
those finding at least one free server immediately enter service and then depart 
when their service is completed. 

(a) If both servers are presently busy, find the expected time until the next cus¬ 
tomer enters the system. 

(b) Starting empty, find the expected time until both servers are busy. 

(c) Find the expected time between two successive lost customers. 

48. Consider an n -server parallel queuing system where customers arrive according 
to a Poisson process with rate X, where the service times are exponential random 
variables with rate //, and where any arrival finding all servers busy immediately 
departs without receiving any service. If an arrival finds all servers busy, find 

(a) the expected number of busy servers found by the next arrival, 

(b) the probability that the next arrival finds all servers free, 
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(c) the probability that the next arrival finds exactly i of the servers free. 

49. Events occur according to a Poisson process with rate X. Each time an event 

occurs, we must decide whether or not to stop, with our objective being to stop at 
the last event to occur prior to some specified time T, where T > 1 /X. That is, if 
an event occurs at time r, 0 ^ ^ T, and we decide to stop, then we win if there 

are no additional events by time T, and we lose otherwise. If we do not stop when 
an event occurs and no additional events occur by time T , then we lose. Also, 
if no events occur by time T, then we lose. Consider the strategy that stops at the 
first event to occur after some fixed time 0 ^ s ^ T. 

(a) Using this strategy, what is the probability of winning? 

(b) What value of s maximizes the probability of winning? 

(c) Show that one’s probability of winning when using the preceding strategy 
with the value of s specified in part (b) is 1 j e. 

50. The number of hours between successive train arrivals at the station is uniformly 
distributed on (0,1). Passengers arrive according to a Poisson process with rate 
7 per hour. Suppose a train has just left the station. Let X denote the number of 
people who get on the next train. Find 

(a) E[X], 

(b) Var(X). 

51. If an individual has never had a previous automobile accident, then the probability 
he or she has an accident in the next h time units is fill + <>(h)\ on the other hand, 
if he or she has ever had a previous accident, then the probability is ah + o(h). 
Find the expected number of accidents an individual has by time t. 

52. Teams 1 and 2 are playing a match. The teams score points according to indepen¬ 
dent Poisson processes with respective rates /,i and Xz- If the match ends when 
one of the teams has scored k more points than the other, find the probability that 
team 1 wins. 

Hint: Relate this to the gambler’s ruin problem. 

53. The water level of a certain reservoir is depleted at a constant rate of 1000 units 
daily. The reservoir is refilled by randomly occurring rainfalls. Rainfalls occur 
according to a Poisson process with rate 0.2 per day. The amount of water added 
to the reservoir by a rainfall is 5000 units with probability 0.8 or 8000 units with 
probability 0.2. The present water level is just slightly below 5000 units. 

(a) What is the probability the reservoir will be empty after five days? 

(b) What is the probability the reservoir will be empty sometime within the next 
ten days? 

54. A viral linear DNA molecule of length, say, 1 is often known to contain a cer¬ 
tain “marked position,” with the exact location of this mark being unknown. One 
approach to locating the marked position is to cut the molecule by agents that 
break it at points chosen according to a Poisson process with rate X. It is then pos¬ 
sible to determine the fragment that contains the marked position. For instance. 
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letting m denote the location on the line of the marked position, then if L \ denotes 
the last Poisson event time before m (or 0 if there are no Poisson events in [0, m]), 
and R\ denotes the first Poisson event time after m (or 1 if there are no Poisson 
events in [m, 1]), then it would be learned that the marked position lies between 
L i and R\. Find 

(a) P{L, = 0}, 

(b) P{L[ < x}, 0 < x < m, 

(c) P{Ri = 1}, 

(d) > x], m < x < 1. 

By repeating the preceding process on identical copies of the DNA molecule, 
we are able to zero in on the location of the marked position. If the cutting 
procedure is utilized on n identical copies of the molecule, yielding the data L,, R,, 
i = 1,..., n, then it follows that the marked position lies between L and R, where 

L = max Li, R = min /?,- 

i i 

(e) Find E[R — L], and in doing so, show that E[R — L] ~ 

55. Consider a single server queuing system where customers arrive according to a 
Poisson process with rate X, service times are exponential with rate /x, and cus¬ 
tomers are served in the order of their arrival. Suppose that a customer arrives 
and finds n — 1 others in the system. Let X denote the number in the system at 
the moment that customer departs. Find the probability mass function of X. 

56. An event independently occurs on each day with probability p. Let N(n) denote 
the total number of events that occur on the first n days, and let T, denote the day 
on which the rth event occurs. 

(a) What is the distribution of N(n)l 

(b) What is the distribution of T\ ? 

(c) What is the distribution of T r l 

(d) Given that N(n) = r, show that the set of r days on which events occurred 
has the same distribution as a random selection (without replacement) of r of 
the values 1,2, ..., n. 

*57. Events occur according to a Poisson process with rate X = 2 per hour. 

(a) What is the probability that no event occurs between 8 P.M. and 9 P.M.? 

(b) Starting at noon, what is the expected time at which the fourth event occurs? 

(c) What is the probability that two or more events occur between 6 P.M. and 8 P.M.? 

58. Each round played by a contestant is either a success with probability p or a failure 
with probability 1 — p. If the round is a success, then a random amount of money 
having an exponential distribution with rate X is won. If the round is a failure, 
then the contestant loses everything that had been accumulated up to that time 
and cannot play any additional rounds. After a successful round, the contestant 
can either elect to quit playing and keep whatever has already been won or can 
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elect to play another round. Suppose that a newly starting contestant plans on 
continuing to play until either her total winnings exceeds t or a failure occurs. 

(a) What is the distribution of N, equal to the number of successful rounds that 
it would take until her fortune exceeds f ? 

(b) What is the probability the contestant will be successful in reaching a fortune 
of at least tl 

(c) Given the contestant is successful, what is her expected winnings? 

(d) What is the expected value of the contestant’s winnings? 

59. There are two types of claims that are made to an insurance company. Let IV; (f) 
denote the number of type i claims made by time t , and suppose that { N\ (t ), 1 ^ 0 } 
and {N 2 (t), t X? 0} are independent Poisson processes with rates X | = 10 and 
X 2 = 1. The amounts of successive type 1 claims are independent exponential 
random variables with mean $1000 whereas the amounts from type 2 claims are 
independent exponential random variables with mean $5000. A claim for $4000 
has just been received; what is the probability it is a type 1 claim? 

*60. Customers arrive at a bank at a Poisson rate X. Suppose two customers arrived 
during the first hour. What is the probability that 

(a) both arrived during the first 20 minutes? 

(b) at least one arrived during the first 20 minutes? 

61. A system has a random number of flaws that we will suppose is Poisson dis¬ 
tributed with mean c. Each of these flaws will, independently, cause the system 
to fail at a random time having distribution G. When a system failure occurs, 
suppose that the flaw causing the failure is immediately located and fixed. 

(a) What is the distribution of the number of failures by time tl 

(b) What is the distribution of the number of flaws that remain in the system at 
time tl 

(c) Are the random variables in parts (a) and (b) dependent or independent? 

62. Suppose that the number of typographical errors in a new text is Poisson dis¬ 
tributed with mean X. Two proofreaders independently read the text. Suppose that 
each error is independently found by proofreader i with probability , i = 1 , 2 . 
Let A i denote the number of errors that are found by proofreader 1 but not by 
proofreader 2. Let X 2 denote the number of errors that are found by proofreader 
2 but not by proofreader 1. Let A 3 denote the number of errors that are found by 
both proofreaders. Finally, let X 4 denote the number of errors found by neither 
proofreader. 

(a) Describe the joint probability distribution of X\ , X 2 , A 3 , X 4 . 

(b) Show that 

E[X 1 ] l-p 2 , E[X 2 \ 1-pi 

-= - and -= - 

E[X 3 ] P2 E[ A 3 ] pi 

Suppose now that X, pi, and p 2 are all unknown. 
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(c) By using X, as an estimator of E[X{\, i = 1,2, 3, present estimators of 
pi, pj, and X. 

(d) Give an estimator of X 4 , the number of errors not found by either proofreader. 

63. Consider an infinite server queuing system in which customers arrive in accor¬ 
dance with a Poisson process with rate X, and where the service distribution is 
exponential with rate //. Let X(t) denote the number of customers in the system 
at time t. Find 

(a) E[X(t + s)|X(>) = «]; 

(b) Var[X(f + s)|X(.y) = n]. 

Hint: Divide the customers in the system at time t + s into two groups, one 
consisting of “old” customers and the other of “new” customers. 

(c) Consider an infinite server queuing system in which customers arrive accord¬ 
ing to a Poisson process with rate X, and where the service times are all 
exponential random variables with rate /i. If there is currently a single cus¬ 
tomer in the system, find the probability that the system becomes empty when 
that customer departs. 

*64. Suppose that people arrive at a bus stop in accordance with a Poisson process 
with rate X. The bus departs at time t. Let X denote the total amount of waiting 
time of all those who get on the bus at time t. We want to determine Var(X). Let 
N (t) denote the number of arrivals by time t. 

(a) What is E[X\N(t)]l 

(b) Argue that Var[X|W(f)] = N(t)t 2 /12. 

(c) What is Var(X)? 

65. An average of 500 people pass the California bar exam each year. A California 
lawyer practices law, on average, for 30 years. Assuming these numbers remain 
steady, how many lawyers would you expect California to have in 2050? 

66 . Policyholders of a certain insurance company have accidents at times distributed 
according to a Poisson process with rate X. The amount of time from when the 
accident occurs until a claim is made has distribution G. 

(a) Find the probability there are exactly n incurred but as yet unreported claims 
at time t. 

(b) Suppose that each claim amount has distribution F, and that the claim amount 
is independent of the time that it takes to report the claim. Find the expected 
value of the sum of all incurred but as yet unreported claims at time t. 

67. Satellites are launched into space at times distributed according to a Poisson 
process with rate X. Each satellite independently spends a random time (having 
distribution G ) in space before falling to the ground. Find the probability that 
none of the satellites in the air at time t was launched before time s, where s < t. 

68 . Suppose that electrical shocks having random amplitudes occur at times dis¬ 
tributed according to a Poisson process {N(t), t Js 0} with rate X. Suppose that 
the amplitudes of the successive shocks are independent both of other amplitudes 
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and of the arrival times of shocks, and also that the amplitudes have distribution 
F with mean /i. Suppose also that the amplitude of a shock decreases with time at 
an exponential rate a, meaning that an initial amplitude A will have value Ae~ ax 
after an additional time x has elapsed. Let Ait) denote the sum of all amplitudes 
at time t. That is, 


A(t) = J2 Aie- a{t ~ Si) 


! = 1 


where A; and .S', are the initial amplitude and the arrival time of shock i. 

(a) Find E[A(t)] by conditioning on N(t). 

(b) Without any computations, explain why A(t) has the same distribution as 
does D(t) of Example 5.21. 

69. Suppose in Example 5.19 that a car can overtake a slower moving car without 
any loss of speed. Suppose a car that enters the road at time .v has a free travel 
time equal to to. Find the distribution of the total number of other cars that it 
encounters on the road (either by passing or by being passed). 

70. For the infinite server queue with Poisson arrivals and general service distribution 
G, find the probability that 

(a) the first customer to arrive is also the first to depart. 

Let S(t) equal the sum of the remaining service times of all customers in the 
system at time t. 

(b) Argue that S{t) is a compound Poisson random variable. 

(c) Find E[S(t)l 

(d) Find Var(S(f)). 

71. Let S n denote the time of the «th event of the Poisson process {Nit), t ^ 0} hav¬ 
ing rate X. Show, for an arbitrary function g, that the random variable 8 ($i) 

has the same distribution as the compound Poisson random variable 8 (Ui), 

where U\, U 2 , ■ ■ ■ is a sequence of independent and identically distributed uni¬ 
form (0, t) random variables that is independent of N, a Poisson random variable 
with mean Xt. Consequently, conclude that 



E 


72. A cable car starts off with n riders. The times between successive stops of the 
car are independent exponential random variables with rate X. At each stop one 
rider gets off. This takes no time, and no additional riders get on. After a rider 
gets off the car, he or she walks home. Independently of all else, the walk takes 
an exponential time with rate /x. 

(a) What is the distribution of the time at which the last rider departs the car? 
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(b) Suppose the last rider departs the car at time t. What is the probability that 
all the other riders are home at that time? 

73. Shocks occur according to a Poisson process with rate X, and each shock indepen¬ 
dently causes a certain system to fail with probability p. Let T denote the time 
at which the system fails and let N denote the number of shocks that it takes. 

(a) Find the conditional distribution of T given that N = n. 

(b) Calculate the conditional distribution of N, given that T = t, and notice that 
it is distributed as 1 plus a Poisson random variable with mean 1(1 — p)t. 

(c) Explain how the result in part (b) could have been obtained without any 
calculations. 

74. The number of missing items in a certain location, call it X, is a Poisson random 
variable with mean X. When searching the location, each item will independently 
be found after an exponentially distributed time with rate p. A reward of R is 
received for each item found, and a searching cost of C per unit of search time is 
incurred. Suppose that you search for a fixed time t and then stop. 

(a) Find your total expected return. 

(b) Find the value of t that maximizes the total expected return. 

(c) The policy of searching for a fixed time is a static policy. Would a dynamic 
policy, which allows the decision as to whether to stop at each time t, depend 
on the number already found by t be beneficial? 

Hint: How does the distribution of the number of items not yet found by time t 
depend on the number already found by that time? 

75. Suppose that the times between successive arrivals of customers at a single-server 
station are independent random variables having a common distribution F. Sup¬ 
pose that when a customer arrives, he or she either immediately enters service if 
the server is free or else joins the end of the waiting line if the server is busy with 
another customer. When the server completes work on a customer, that customer 
leaves the system and the next waiting customer, if there are any, enters service. 
Let X n denote the number of customers in the system immediately before the 
nth arrival, and let Y n denote the number of customers that remain in the sys¬ 
tem when the nth customer departs. The successive service times of customers 
are independent random variables (which are also independent of the interarrival 
times) having a common distribution G. 

(a) If F is the exponential distribution with rate X, which, if any, of the processes 
{X„}, {T„} is a Markov chain? 

(b) If G is the exponential distribution with rate p, which, if any, of the processes 
{X n }, ! Li 1 is a Markov chain? 

(c) Give the transition probabilities of any Markov chains in parts (a) and (b). 

76. For the model of Example 5.27, find the mean and variance of the number of 
customers served in a busy period. 

77. Suppose that customers arrive to a system according to a Poisson process with 
rate X. There are an infinite number of servers in this system so a customer begins 
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service upon arrival. The service times of the arrivals are independent exponential 
random variables with rate //, and are independent of the arrival process. Cus¬ 
tomers depart the system when their service ends. Let N be the number of arrivals 
before the first departure. 

(a) Find P(N = 1). 

(b) Find P(N = 2). 

(c) Find P(N = j). 

(d) Find the probability that the first to arrive is the first to depart. 

(e) Find the expected time of the first departure. 


78. A store opens at 8 A.M. From 8 until 10 A.M. customers arrive at a Poisson rate of 
four an hour. Between 10 A.M. and 12 P.M. they arrive at a Poisson rate of eight an 
hour. From 12 P.M. to 2 P.M. the arrival rate increases steadily from eight per hour 
at 12 P.M. to ten per hour at 2 P.M.; and from 2 to 5 P.M. the arrival rate drops steadily 
from ten per hour at 2 P.M. to four per hour at 5 PM.. Determine the probability 
distribution of the number of customers that enter the store on a given day. 

*79. Suppose that events occur according to a nonhomogeneous Poisson process with 
intensity function /.(f ). f > 0. Further, suppose that an event that occurs at time 
s is a type 1 event with probability p(s), s > 0. If N\ (f) is the number of type 1 
events by time f, what type of process is (A/)(f), f ^ 0}? 

80. Let If, 72 ,... denote the interarrival times of events of a nonhomogeneous Pois¬ 
son process having intensity function X(t). 


(a) Are the 7) independent? 

(b) Are the 7) identically distributed? 

(c) Find the distribution of 7). 

81. (a) Let {N(t),t ^ 0} be a nonhomogeneous Poisson process with mean value 
function m{t). Given N(t) = n, show that the unordered set of arrival times 
has the same distribution as n independent and identically distributed random 
variables having distribution function 


F(x) = 


m(x) 
m(t ) ’ 

l, 


x < r 
x ^ t 


(b) Suppose that workmen incur accidents in accordance with a nonhomogeneous 
Poisson process with mean value function /72(f). Suppose further that each 
injured man is out of work for a random amount of time having distribution F. 
Let X(t) be the number of workers who are out of work at time f. By using 
part (a), find Zs[X(f)]. 


82. Let X \, X 2 , ■ ■ . be independent positive continuous random variables with a com¬ 
mon density function /, and suppose this sequence is independent of N, a Poisson 
random variable with mean X. Define 


N(t) = number of i ^ N : Xj ^ t 

Show that {N(t),t ^ 0} is a nonhomogeneous Poisson process with intensity 
function X(t) = X f(t). 
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83. Suppose that {No(t), t f 0} is a Poisson process with rate X — 1. Let X{t) denote 
a nonnegative function of t, and let 



Define N(t ) by 

N(t) = No(m(t)) 

Argue that [Nit), t f 0} is a nonhomogeneous Poisson process with intensity 
function Xf), t f 0. 

Hint: Make use of the identity 

m(t + h) — mit) — m'{t)h + o(h) 

*84. Let Xi, X 2 , . ■ . be independent and identically distributed nonnegative continuous 
random variables having density function f(x). We say that a record occurs at 
time n if X n is larger than each of the previous values X 1 , ..., X n -\. (A record 
automatically occurs at time 1.) If a record occurs at time n, then X„ is called a 
record value. In other words, a record occurs whenever a new high is reached, and 
that new high is called the record value. Let N(t) denote the number of record 
values thatare less than orequaltof. Characterize the process {N(t), t ^ 0} when 

(a) / is an arbitrary continuous density function. 

(b) f(x) = Xe~ lx . 

Hint: Finish the following sentence: There will be a record whose value is 
between t and t + dt if the first X, that is greater than t lies between ... 

85. An insurance company pays out claims on its life insurance policies in accordance 
with a Poisson process having rate X = 5 per week. If the amount of money paid 
on each policy is exponentially distributed with mean $2000, what is the mean and 
variance of the amount of money paid by the insurance company in a four-week 
span? 

86 . In good years, storms occur according to a Poisson process with rate 3 per unit 
time, while in other years they occur according to a Poisson process with rate 
5 per unit time. Suppose next year will be a good year with probability 0.3. Let 
N it) denote the number of storms during the first t time units of next year. 

(a) Find P{Nf) — n}. 

(b) Is {iV(f)} a Poisson process? 

(c) Does {Nf)} have stationary increments? Why or why not? 

(d) Does it have independent increments? Why or why not? 

(e) If next year starts off with three storms by time t = 1, what is the conditional 
probability it is a good year? 

87. Determine 

Cov[A(f), X{t + i)] 

when {Xf), t ^ 0} is a compound Poisson process. 
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88 . Customers arrive at the automatic teller machine in accordance with a Poisson pro¬ 
cess with rate 12 per hour. The amount of money withdrawn on each transaction 
is a random variable with mean $30 and standard deviation $50. (A negative with¬ 
drawal means that money was deposited.) The machine is in use for 15 hours daily. 
Approximate the probability that the total daily withdrawal is less than $6000. 

89. Some components of a two-component system fail after receiving a shock. Shocks 
of three types arrive independently and in accordance with Poisson processes. 
Shocks of the first type arrive at a Poisson rate k\ and cause the first component 
to fail. Those of the second type arrive at a Poisson rate ki and cause the second 
component to fail. The third type of shock arrives at a Poisson rate A. 3 and causes 
both components to fail. Let X\ and Xi denote the survival times for the two 
components. Show that the joint distribution of X \ and Xi is given by 


P{X i > s, Xi > f} = exp{— Lis — kjt — A 3 max(s, t)} 


This distribution is known as the bivariate exponential distribution. 

90. In Exercise 89 show that X\ and Xi both have exponential distributions. 

*91. Let X\, Xi ,..., X n be independent and identically distributed exponential ran¬ 
dom variables. Show that the probability that the largest of them is greater than 
the sum of the others is n /2' !_ 1 . That is, if 


M = max Xj 

j 


then show 


P 



Hint: What is P{A, > A '/! ? 

92. Prove Equation (5.22). 

93. Prove that 


(a) max(Xi, X 2 ) — Xi + X 2 — min(Xi, X 2 ) and, in general, 


n 


(b) max(Xi, ..., X n ) = Y]Xj- ^ ^min(X,-, Xj) 


1 i <j 


+^^^min(2L, Xj,X k )-\ - 


i < j<k 


+ (-!)" 1 min(X;, Xj - ,X n ) 


(c) Show by defining appropriate random variables Xj, i = ,n, and by 

taking expectations in part (b) how to obtain the well-known formula 
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= J]P(A i )-£]J>(A ! A ; )+ ••• + (-\) n ~ l P{M--- A n ) 

‘ i <j 

(d) Consider n independent Poisson processes—the i th having rate a, . Derive an 
expression for the expected time until an event has occurred in all n processes. 

94. A two-dimensional Poisson process is a process of randomly occurring events in 
the plane such that 

(i) for any region of area A the number of events in that region has a Poisson 
distribution with mean A A, and 

(ii) the number of events in nonoverlapping regions are independent. 

For such a process, consider an arbitrary point in the plane and let X denote its 
distance from its nearest event (where distance is measured in the usual Euclidean 
manner). Show that 

(a) P{X >t} = e~ XlIt2 , 

( fe ) E[X] = 

95. Let {N(t), t ^ 0} be a conditional Poisson process with a random rate L. 

(a) Derive an expression for E[L\N(t) — n], 

(b) Find, for.? > t , E[N(s)\N(t) = n], 

(c) Find, for.? < t, E[N(s)\N(t) — «]. 

96. For the conditional Poisson process, let m i = E[L], m2 = E[L 2 ]. In terms of 
171 ] and m2, find Co v(N(s), N(t )) for s ^ t. 

97. Consider a conditional Poisson process in which the rate L is, as in Example 
5.29, gamma distributed with parameters m and p. Find the conditional density 
function of L given that N(t) = n. 

98. Let M(t) = E[D(t)] in Example 5.21. 

(a) Show that 

M(t + h) = M(t ) + e~ at Xhp. + o(h ) 

(b) Use (a) to show that 

M\t ) = kiie~ al 

(c) Show that 

M{t) = — (1 -e~ at ) 
a 

99. Let X be the time between the first and the second event of a Hawkes process 
with mark distribution F. Find P(X > t). 
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Continuous-Time 
Markov Chains 



6.1 Introduction 

In this chapter we consider a class of probability models that has a wide variety of 
applications in the real world. The members of this class are the continuous-time analogs 
of the Markov chains of Chapter 4 and as such are characterized by the Markovian 
property that, given the present state, the future is independent of the past. 

One example of a continuous-time Markov chain has already been met. This is the 
Poisson process of Chapter 5. For if we let the total number of arrivals by time 1 (that 
is, N(t)) be the state of the process at time t, then the Poisson process is a continuous¬ 
time Markov chain having states 0, 1,2,... that always proceeds from state n to state 
n + 1, where n ^ 0. Such a process is known as a pure birth process since when a 
transition occurs the state of the system is always increased by one. More generally, an 
exponential model that can go (in one transition) only from state n to either state n — I 
or state n + 1 is called a birth and death model. For such a model, transitions from 
state n to state n + 1 are designated as births, and those from n to n — I as deaths. Birth 
and death models have wide applicability in the study of biological systems and in the 
study of waiting line systems in which the state represents the number of customers in 
the system. These models will be studied extensively in this chapter. 

In Section 6.2 we define continuous-time Markov chains and then relate them to 
the discrete-time Markov chains of Chapter 4. In Section 6.3 we consider birth and 
death processes and in Section 6.4 we derive two sets of differential equations—the 
forward and backward equations—that describe the probability laws for the system. 
The material in Section 6.5 is concerned with determining the limiting (or long-run) 
probabilities connected with a continuous-time Markov chain. In Section 6.6 we con¬ 
sider the topic of time reversibility. We show that all birth and death processes are time 
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reversible, and then illustrate the importance of this observation to queueing systems. 
In the final section we show how to “uniformize” Markov chains, a technique useful 
for numerical computations. 


6.2 Continuous-Time Markov Chains 

Suppose we have a continuous-time stochastic process {X(t), t ^()\ taking on values in 
the set of nonnegative integers. In analogy with the definition of a discrete-time Markov 
chain, given in Chapter 4, we say that the process {X(t), t ^ 0} is a continuous-time 
Markov chain if for all s, t ^ 0 and nonnegative integers i, j. xiii ), 0 ^ u < s 

P{X{t + s) = yjXC?) = i, X(u) — x(u), 0 ^ u < s] 

= P{X(t + s) = j\X(s) = i} 

In other words, a continuous-time Markov chain is a stochastic process having the 
Markovian property that the conditional distribution of the future X(t + s) given the 
present X(j') and the past Xiu). 0 ^ u < s, depends only on the present and is indepen¬ 
dent of the past. If, in addition, 

P{X(t+s) = j\X(s) = i} 

is independent of s, then the continuous-time Markov chain is said to have stationary 
or homogeneous transition probabilities. 

All Markov chains considered in this text will be assumed to have stationary tran¬ 
sition probabilities. 

Suppose that a continuous-time Markov chain enters state i at some time, say, time 
0, and suppose that the process does not leave state i (that is, a transition does not 
occur) during the next ten minutes. What is the probability that the process will not 
leave state i during the following five minutes? Since the process is in state i at time 10 
it follows, by the Markovian property, that the probability that it remains in that state 
during the interval [10,15] is just the (unconditional) probability that it stays in state i 
for at least five minutes. That is, if we let 7) denote the amount of time that the process 
stays in state i before making a transition into a different state, then 

P{Tj > 151 Ti > 10} = P{Tj > 5} 

or, in general, by the same reasoning, 

P{Ti > s + t\Tj > s] = P{Tj > t } 

for all s,t 0. Hence, the random variable 7, is memoryless and must thus (see 
Section 5.2.2) be exponentially distributed. 

In fact, the preceding gives us another way of defining a continuous-time Markov 
chain. Namely, it is a stochastic process having the properties that each time it enters 
state i 

(i) the amount of time it spends in that state before making a transition into a different 

state is exponentially distributed with mean, say, 1/u;, and 
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(ii) when the process leaves state i, it next enters state j with some probability, say, 
Pij- Of course, the Pij must satisfy 

Pa =0, all i 

T: Pij = 1 , alii 

j 

In other words, a continuous-time Markov chain is a stochastic process that moves 
from state to state in accordance with a (discrete-time) Markov chain, but is such 
that the amount of time it spends in each state, before proceeding to the next state, is 
exponentially distributed. In addition, the amount of time the process spends in state i, 
and the next state visited, must be independent random variables. For if the next state 
visited were dependent on 7}, then information as to how long the process has already 
been in state i would be relevant to the prediction of the next state—and this contradicts 
the Markovian assumption. 

Example 6.1 (A Shoe Shine Shop) Consider a shoe shine establishment consisting 
of two chairs—chair 1 and chair 2. A customer upon arrival goes initially to chair 1 
where his shoes are cleaned and polish is applied. After this is done the customer moves 
on to chair 2 where the polish is buffed. The service times at the two chairs are assumed 
to be independent random variables that are exponentially distributed with respective 
rates //j and /x 2 . Suppose that potential customers arrive in accordance with a Poisson 
process having rate X, and that a potential customer will enter the system only if both 
chairs are empty. 

The preceding model can be analyzed as a continuous-time Markov chain, but first 
we must decide upon an appropriate state space. Since a potential customer will enter 
the system only if there are no other customers present, it follows that there will always 
either be 0 or 1 customers in the system. However, if there is 1 customer in the system, 
then we would also need to know which chair he was presently in. Hence, an appropriate 
state space might consist of the three states 0, 1, and 2 where the states have the 
following interpretation: 

State Interpretation 

0 system is empty 

1 a customer is in chair 1 

2 a customer is in chair 2 

We leave it as an exercise for you to verify that 
v 0 — X, Vi = m, V 2 = H2, 

Pot = Pi 2 = P 20 =1 ■ 

6.3 Birth and Death Processes 

Consider a system whose state at any time is represented by the number of people in the 
system at that time. Suppose that whenever there are n people in the system, then (i) new 
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arrivals enter the system at an exponential rate X „, and (ii) people leave the system at an 
exponential rate pL„ . That is, whenever there are n persons in the system, then the time 
until the next arrival is exponentially distributed with mean 1 jX n and is independent 
of the time until the next departure, which is itself exponentially distributed with mean 
1 /ii n . Such a system is called a birth and death process. The parameters {X n }^_ 0 and 
{/r.„}^_i are called, respectively, the arrival (or birth) and departure (or death) rates. 

Thus, a birth and death process is a continuous-time Markov chain with states 
{0, 1,...} for which transitions from state n may go only to either state n — 1 or state 
n + 1. The relationships between the birth and death rates and the state transition rates 
and probabilities are 


Vo — X 0, 

Vi = Xj + pi , 

i > 0 

II 

=e 


Xj 

Pi ’ i+1 = ’ 

Xj + p,j 

i > 0 

111 

Pi — 

Xj + p,j 

i > 0 


The preceding follows, because if there are i in the system, then the next state will be 
i +1 if a birth occurs before a death, and the probability that an exponential random 
variable with rate /,, will occur earlier than an (independent) exponential with rate /i, is 
X i /(Xj + /ii). Moreover, the time until either a birth or a death occurs is exponentially 
distributed with rate Xj + fij (and so, Vj = Xj + Hi). 

Example 6.2 (The Poisson Process) Consider a birth and death process for which 

/z„ = 0, for all n ^ 0 
X n = X, for all n 0 

This is a process in which departures never occur, and the time between successive 
arrivals is exponential with mean 1 /X. Hence, this is just the Poisson process. ■ 

A birth and death process for which //„ = 0 for all n is called a pure birth process. 
Another pure birth process is given by the next example. 

Example 6.3 (A Birth Process with Linear Birth Rate) Consider a population 
whose members can give birth to new members but cannot die. If each member acts 
independently of the others and takes an exponentially distributed amount of time, with 
mean 1 /X, to give birth, then if X (?) is the population size at time ?, then ( X (?), t js 0} is 
a pure birth process with X n = nX. n ^ 0. This follows since if the population consists 
of n persons and each gives birth at an exponential rate X, then the total rate at which 
births occur is nX. This pure birth process is known as a Yule process after G. Yule, 
who used it in his mathematical theory of evolution. ■ 
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Example 6.4 (A Linear Growth Model with Immigration) A model in which 

pt n — npt, n ^ 1 
X n = nX + 6, n ^ 0 


is called a linear growth process with immigration. Such processes occur naturally 
in the study of biological reproduction and population growth. Each individual in the 
population is assumed to give birth at an exponential rate X; in addition, there is an 
exponential rate of increase 6 of the population due to an external source such as 
immigration. Hence, the total birth rate where there are n persons in the system is 
nX + 6. Deaths are assumed to occur at an exponential rate fx for each member of the 
population, so fx n = n/i. 

Let Xit) denote the population size at time t. Suppose that X(0) = i and let 
M(t) = E[X(t )] 

We will determine M(t) by deriving and then solving a differential equation that it 
satisfies. 

We start by deriving an equation for M(t + h) by conditioning on X(t). This yields 

M(t + h) = E[X(t + h )] 

= E[E[X(t + h)\X(t)]] 


Now, given the size of the population at time t then, ignoring events whose probability 
is o{h), the population at time t + h will either increase in size by 1 if a birth or an 
immigration occurs in ( t , t + h), or decrease by 1 if a death occurs in this interval, or 
remain the same if neither of these two possibilities occurs. That is, given X(t), 


X(t + h) = 


*(0 + i , 
*( 0 , 


with probability [6 + X(t)X]h + o(h) 

with probability X(t)p.h + o{h) 

with probability 1 — [0 + X(t)X + X(t)pC\ h + o(h ) 


Therefore, 


E[X(t + h)\X(t )] = X(t) + [6 + X(t)X - X(t)fx]h + o(h) 
Taking expectations yields 

M(t + h) = M(f) + (X - p,)M(t)h + 9h + o(h) 
or, equivalently, 


M{t + h) — M(t) 


— (A. — fx)M(t) + 9 + 


o(h) 


h h 

Taking the limit as h 0 yields the differential equation 

M\t ) = (X- p,)M(t) + 0 


( 6 . 1 ) 
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If we now define the function h(t) by 
h(t) = (A. — /x )M(t) + 0 

then 


h'(t) = (k- n)M\t) 

Therefore, Differential Equation (6.1) can be rewritten as 
h'(t) 


\ — li 


= h(t) 


h'(t) 


= k — /r 


hit) 

Integration yields 

log [hit)] = (k- fi)t + c 


or 


hit) = Ke (l - fl)t 

Putting this back in terms of M(t) gives 
0 + {k- n)M{t) = Ke {x ~ M)r 

To determine the value of the constant K, we use the fact that M(0) = i and evaluate 
the preceding at t — 0. This gives 

0 + (k - /x)i = K 

Substituting this back in the preceding equation for M it) yields the following solution 
for Mit): 


Mit) = —— - 1] + ie (A_M)r 

X — ii 

Note that we have implicitly assumed that k ^ /r. If k = /r, then Differential Equation 
(6.1) reduces to 

M\t) = 9 (6.2) 

Integrating (6.2) and using that M( 0) = i gives the solution 


Mit) — Ot + i 
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Example 6.5 (The Queueing System M/M/1) Suppose that customers arrive at a 
single-server service station in accordance with a Poisson process having rate X. That 
is, the times between successive arrivals are independent exponential random variables 
having mean 1 /X. Upon arrival, each customer goes directly into service if the server is 
free; if not, then the customer joins the queue (that is, he waits in line). When the server 
finishes serving a customer, the customer leaves the system and the next customer 
in line, if there are any waiting, enters the service. The successive service times are 
assumed to be independent exponential random variables having mean l//x. 

The preceding is known as the M/M /1 queueing system. The first M refers to the 
fact that the interarrival process is Markovian (since it is a Poisson process) and the 
second to the fact that the service distribution is exponential (and, hence, Markovian). 
The 1 refers to the fact that there is a single server. 

If we let X(t) denote the number in the system at time t then {X(t),t Js 0} is a 
birth and death process with 

M « = M, n Js 1 

X n = X, n Js 0 ■ 


Example 6.6 (A Multiserver Exponential Queueing System) Consider an expo¬ 
nential queueing system in which there are s servers available, each serving at rate fi. 
An entering customer first waits in line and then goes to the first free server. This is a 
birth and death process with parameters 


En — 


n/x, 

SfX , 


/■n — k, 


l ^ n ^ s 
n > s 
n ^ 0 


To see why this is true, reason as follows: If there are n customers in the system, where 
n X s, then n servers will be busy. Since each of these servers works at rate //, the total 
departure rate will be n/x. On the other hand, if there are n customers in the system, 
where n > s, then all s of the servers will be busy, and thus the total departure rate will 
be s/x. This is known as an M/M/s queueing model. ■ 

Consider now a general birth and death process with birth rates {A.,,} and death rates 
{//„}, where /xq = 0, and let 7) denote the time, starting from state i, it takes for the 
process to enter state i + 1, i ^ 0. We will recursively compute E [ 7) ], i js 0, by 
starting with i = 0. Since Tq is exponential with rate ao, we have 

E[T 0 ] = J- 

For i > 0, we condition whether the first transition takes the process into state / — 1 or 
i + 1. That is, let 


7 / = 


1, if the first transition from i is to i + I 
0, if the first transition from i is to i — I 
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and note that 



(6.3) 


This follows since, independent of whether the first transition is from a birth or death, 
the time until it occurs is exponential with rate A; + //,; if this first transition is a birth, 
then the population size is at z + 1, so no additional time is needed; whereas if it is 
death, then the population size becomes i — 1 and the additional time needed to reach 
i + 1 is equal to the time it takes to return to state i (this has mean £ [ 7’, _ i ]) plus the 
additional time it then takes to reach i + 1 (this has mean E[Tj]). Hence, since the 
probability that the first transition is a birth is A,-/(A,- + /r ; ), we see that 


E[Ti] = —^ + -4 (E[Ti-H + E[Tj]) 


Xj + fij Xj + fii 

or, equivalently, 


E[Ti]= 3- + fp-E[Ti-i], 1 


Starting with £[70] = 1 /Aq, the preceding yields an efficient method to successively 
compute £[7\], £[£ 2 ], and so on. 

Suppose now that we wanted to determine the expected time to go from state i to 
state j where i < j. This can be accomplished using the preceding by noting that this 
quantity will equal £[7)] + £[7/ + i] + • ■ ■ + £[£/_i]. 

Example 6.7 For the birth and death process having parameters X, = X, Hi = AT 


E[T t ] = | + ^ £ [ 7/ _ j ] 

A A 

= Ul + fiElTi-!]) 

A 


Starting with £[To] = 1/A, we see that 



and, in general. 



1 


2 


1 - (h/ a ) 1+1 


X — h 


i > 0 
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The expected time to reach state j, starting at state k, k < j, is 


j -1 

£[time to go from k to j] = 

i=k 

_j-k (mA)* + 1 [i - Q*A) J '~*] 

X —/x X — fx 1— fx/X 
The foregoing assumes that X ^ fx. If X = fx, then 


E[T t ] 


/-’[time to go from k to j] 


i + 1 
X ’ 

j (j + 1) — k(k + 1) 
2X 


We can also compute the variance of the time to go from 0 to i + 1 by utilizing the 
conditional variance formula. First note that Equation (6.3) can be written as 

E[Ti\Ii] = — 1 -+ (1 - Ii)(E[Ti-i] + E[Ti ]) 

Xi + ix i 

Thus, 

Var(£[7)|/,]) = (£[7)_j] + £[7)]) 2 Var(/,) 

= {E[Ti_ j] + E[Ti]) 2 ^ (6.4) 

(/xi + Xiy 

where Var(/,) is as shown since I, is a Bernoulli random variable with parameter 
p = Xj/{Xi + fXi). Also, note that if we let A, denote the time until the transition from 
i occurs, then 


Var(7)|/; = 1) = Var(A,|/, = 1) 
= Var(A,) 

1 

(Xj + /Xi ) 2 


(6.5) 


where the preceding uses the fact that the time until transition is independent of the 
next state visited. Also, 


Var(7) \Ij = 0) = Var(A, + time to get back to i + time to then reach i +1) 

= Var(A,) + Var(7)_i) + Var(7}) (6.6) 

where the foregoing uses the fact that the three random variables are independent. We 
can rewrite Equations (6.5) and (6.6) as 

Var(7)|/;) = Var(Xi) + (1 - / i )[Var(7’,_ l ) + Var(7})] 


so 


1 

(/Xi +Xi) 2 


_/A_ 

(Xi + Xj 


E[War(Ti \ /, )] = 


[Var(7;_i) + Var(7))] 


(6.7) 
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Hence, using the conditional variance formula, which states that Var(7)) is the sum of 
Equations (6.7) and (6.4), we obtain 

Var(7)) = --A—2 + —[Var (T-.j) + Var(7})] 

(/x/ + Xi) z \±i + A./ 

+ , ^ (E[r,-i] + E[Ti]f 

in-i + h)- 

or, equivalently, 

Var(Ti) = — - -- + ^ Var(7}_ 1 ) + —(£[7)_i] + £[7)]) 2 

+ f^i) hi /x/ + A/ 

Starting with Var( 7’o) = 1 and using the former recursion to obtain the expectations, 
we can recursively compute Var(7/). In addition, if we want the variance of the time to 
reach state j, starting from state k, k < j, then this can be expressed as the time to go 
from k to k + 1 plus the additional time to go from k + 1 to k + 2, and so on. Since, by 
the Markovian property, these successive random variables are independent, it follows 
that 

7-1 

Var(time to go from A: to j ) = J>ar(7■) 

i=k 


6.4 The Transition Probability Function P,y(£) 


Let 


Pij(t) = P{X(t + s) = j\X(s) = i] 

denote the probability that a process presently in state i will be in state j a time t later. 
These quantities are often called the transition probabilities of the continuous-time 
Markov chain. 

We can explicitly determine Pjj it ) in the case of a pure birth process having distinct 
birth rates. For such a process, let X * denote the time the process spends in state k 
before making a transition into state k + 1, k ^ 1. Suppose that the process is presently 
in state i, and let j > i. Then, as X, is the time it spends in state i before moving to 
state i + 1, and X, + \ is the time it then spends in state i + 1 before moving to state 
i + 2, and so on, it follows that W- is the time it takes until the process enters 

state j. Now, if the process has not yet entered state j by time t, then its state at time t 
is smaller than j , and vice versa. That is, 

X(t) < j 44- Xi + • • • + Xj —i > t 
Therefore, for i < j, we have for a pure birth process that 

!>"'} 


P{X(t) < j\X(0) = i] = P 
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However, since X,-, ..., X,-_ \ are independent exponential random variables with respec¬ 
tive rates A.,-,..., i, we obtain from the preceding and Equation (5.9), which gives 
the tail distribution function of _J X k , that 

j - 1 J - 1 k 

P{X(t) < j\X(0)=i} = J2e~ Xk ' II 

k=i rjik, r=i r k 

Replacing j by j + 1 in the preceding gives 

j j x 

P{X(t)<j + l\X(0) = i} = Te~ h ‘ IT 

, ,, • A-r ^k 

k=i ff=k, r=i 

Since 


P{X(t) = j |X(0) = i} = P{X(t) < j + 1|X(0) = i] - P{X(t ) < ;|X(0) = i] 

and since Pa(t) = P{Xj > t\ = e~ Xit , we have shown the following. 

Proposition 6.1 For a pure birth process having k, ^ kj when i ^ j 


P,j(t) k n Xr -x k T, e k n k r -k k ’ 

k=i Pf=k, r=i k—i r^k, r=i 

Pu(t) = e~ k '< 


i < j 


Example 6.8 Consider the Yule process, which is a pure birth process in which each 
individual in the population independently gives birth at rate k, and so k n = nk, n ^ 1. 
Letting i = 1, we obtain from Proposition 6.1 


j -1 


j -1 




-E‘- ku n 


7=1 r ^k,r=\ r ^ k= 1 r^tk, r= \' ^ 


j~ 1 


7-1 


7-1 


= 


n—n ~ n 


r =l - J 7=1 
7-1 


, r — k 11 r — k 
r^k, r =1 r^k, r =1 


= e -j k, (-\y- { + J2 e ~ klt J 


7=1 


j ~k 


7-1 

■) n 

rjtk, r=l 


r 

r-k 


Now, 


k i-4 r 


— rr _^ =- ( i^l - 

j-k J 1 r-k (1 - k)(2 - k) • • • (k - 1 - k)(j - k)\ 

rj^k, r =1 


= (-if 


-i (j -1 
k — 1 
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SO 

w=ii { J k : JV^c-d *- 1 

= T lN )e _af (-iy 

i=o ' ' 7 

= e _A - r (l 

Thus, starting with a single individual, the population size at time t has a geometric 
distribution with mean e Xt . If the population starts with i individuals, then we can 
regard each of these individuals as starting her own independent Yule process, and so 
the population at time 1 will be the sum of i independent and identically distributed 
geometric random variables with parameter e~ Xr . But this means that the conditional 
distribution of X(t), given that X (0) = i, is the same as the distribution of the number 
of times that a coin that lands heads on each flip with probability e~ Xl must be flipped 
to amass a total of i heads. Hence, the population size at time t has a negative binomial 
distribution with parameters i and e~ Xt , so 

Ppd) = (jlty e ~ a ‘( l - e~ Xr ) J ~d 

(We could, of course, have used Proposition 6.1 to immediately obtain an equation for 
Pij d), rather than just using it for Pij(f). but the algebra that would have then been 
needed to show the equivalence of the resulting expression to the preceding result is 
somewhat involved.) ■ 

We shall now derive a set of differential equations that the transition probabilities 
Pij (t) satisfy in a general continuous-time Markov chain. However, first we need a 
definition and a pair of lemmas. 

For any pair of states i and j , let 

qij = Vi Pij 

Since u, is the rate at which the process makes a transition when in state i and Pij is 
the probability that this transition is into state j, it follows that qij is the rate, when 
in state i, at which the process makes a transition into state j. The quantities qij are 
called the instantaneous transition rates. Since 

Vi — ^ Vi Pij — ^ qtj 

j j 

and 

p. — — Q'j 

r ij 

v i j Qij 

it follows that specifying the instantaneous transition rates determines the parameters 
of the continuous-time Markov chain. 
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Lemma 6.2 


(a) 

(b) 


lim/,^0 


1 - Pa (h) 
h 


lim/,^0 


Pij(h) 

h 


= Vi 
— Qij 


when i ^ j 


Proof. We first note that since the amount of time until a transition occurs is expo¬ 
nentially distributed it follows that the probability of two or more transitions in a time 
h is o(h). Thus, 1 — Pa(h), the probability that a process in state i at time 0 will not 
be in state i at time h, equals the probability that a transition occurs within time h plus 
something small compared to h. Therefore, 


1 — Pa(h) = Vjh + o{h) 


and part (a) is proven. To prove part (b), we note that Pij ( h ), the probability that the 
process goes from state i to state j in a time h, equals the probability that a transition 
occurs in this time multiplied by the probability that the transition is into state j, plus 
something small compared to h. That is. 


Pij(h) = h vi Pij + o(h) 


and part (b) is proven. ■ 

Lemma 6.3 For all 5 ^ 0, t ^ 0, 


OO 

Pijit + .V) = Y] P ik(t)Pkj(s ) 
k =0 


( 6 . 8 ) 


Proof. In order for the process to go from state i to state j in time t + .y, it must be 
somewhere at time t and thus 

Pijit + .v) = P{X(t + s) = j | X (0) = i] 

OO 


= p{x(t+s) = p x(?) = = *} 

k =0 
00 

= J2 + s) = j\x(t) = k, X(0) = i } ■ P{X(t ) = k\X(0) = i} 

k =0 

OO 

= J2 P ^ + s) = j 1^(0 = k) ■ Pi x (t ) = *1^(0) = i} 

k =0 


OO 

= J] P k j(s)P lk (t) 
k =0 


and the proof is completed. 
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The set of Equations (6.8) is known as the Chapman-Kolmogorov equations. From 
Lemma 6.3, we obtain 


Pijih +t)~ Pij(t) = J2 Pik(h)P kj (t) - Pij(t) 

k =0 

= Pik(h)Pkj(t) - [1 - Pii(h)]Pij(t) 


and thus 


P(t + h) - P (t) 

lim —--— = ltm 

o h /i-s-o 


^^ PjkOi) 

Y. J ir L ^ «> 


k+i 


r i - Pav oi 

Pu(t) ! 

h 


J 


Now, assuming that we can interchange the limit and the summation in the preceding 
and applying Lemma 6.2, we obtain 


Pij (0 = ^qikPkjit) - Vi Pi jit) 
k+i 

It turns out that this interchange can indeed be justified and, hence, we have the fol¬ 
lowing theorem. 

Theorem 6.1 (Kolmogorov’s Backward Equations) For all states i, j, and times 
t > 0, 

P^ (0 = ^qikPkj(t) - Vi ^ (t) 
kjti 


Example 6.9 The backward equations for the pure birth process become 
P'ijit) = XiPi +lJ (t ) - XiPij(t) 

Example 6.10 The backward equations for the birth and death process become 


p 0j(t) = ^oPijO) ~ ^oPojiO, 


Oj 

P'ijit) = (X, + Hi) 


Xj tii 

P i+lj (t)+-^—Pi_ hj {t) 


Xj + m 


Xj + Hi 


— (Xj + Hi) Pij (0 


or equivalently. 


Pipit) = XolPijit) - P 0j (t)l 

P'jjit) = XiP i+i j(t) + Hi Pi-l,j(t) - (Xi + Hi)Pij(t), 


i > 0 


(6.9) 













Continuous-Time Markov Chains 


371 


Example 6.11 (A Continuous-Time Markov Chain Consisting of Two States) 

Consider a machine that works for an exponential amount of time having mean 1 jX 
before breaking down; and suppose that it takes an exponential amount of time having 
mean 1 //x to repair the machine. If the machine is in working condition at time 0, then 
what is the probability that it will be working at time t = 10 ? 

To answer this question, we note that the process is a birth and death process (with 
state 0 meaning that the machine is working and state 1 that it is being repaired) having 
parameters 

A-o = X, /a i = p, 

Xi = 0 , i 0 , p/ = 0 , i 7 ^ 1 

We shall derive the desired probability, namely, Poo (10) by solving the set of differential 
equations given in Example 6.10. From Equation (6.9), we obtain 


= A[Pio(0 - -Pood)], 
P,' 0 (O = P-Poo(0 - pPio(0 


( 6 . 10 ) 


( 6 . 11 ) 


Multiplying Equation (6.10) by p and Equation (6.11) by X and then adding the two 
equations yields 


//P' 0 (0 + kP[ 0 (t) = 0 
By integrating, we obtain 


pPoo(0 + A.Pio(f) = c 

However, since Poo(0) = 1 and Pio(0) = 0, we obtain c = p and hence. 


pPoo(0 + APio(r) = M 


( 6 . 12 ) 


or equivalently, 

A^Pio (0 = M [1 — Poo(f)] 

By substituting this result in Equation (6.10), we obtain 

Plait) — /x[l - Poo(0] - APoo(f) 

= p- (m + A)Poo(0 

Letting 

h(t) = Poo(t)- 


p. + X 


we have 
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h'(t) 

hit) 


— — (m + k) 


By integrating both sides, we obtain 
log h(t) = + X)t + c 

or 

hit) = Ke~ (fi+l)t 
and thus 

Poo(f) = Ke~^ +X)t + 

/x + X 

which finally yields, by setting t — 0 and using the fact that Pqo(O) = 1, 


Poo(t) = 


X 


-e 


— (//+/-)£ 


/X 


/x ~F X fi -F A. 

From Equation (6.12), this also implies that 


Pw(t) = 


P- „-(/x+A)f 


fJL X /i X 
Hence, our desired probability is as follows: 


^oo (10) = 


1 


— 10(/a+A) 




/x + X 


/x + X 


Another set of differential equations, different from the backward equations, may 
also be derived. This set of equations, known as Kolmogorov’s forward equations 
is derived as follows. From the Chapman-Kolmogorov equations (Lemma 6.3), we 
have 

(30 

Pi jit + h) - Pi jit) = Pikit)Pkj(h) - Pi jit) 
k=0 

= J2 p ik(t)Pkj(h) - [1 - Pjjih)]Pijit) 
k^j 

and thus 


P(t + h) - P(t) 

Inn —---— = lim 


/!-> 0 


h^O 


J2 P ‘k(t) 

k*j 


Pkjih) 

h 


r i - p jjW i 

p ij(t)\ 

h 


\ 


and, assuming that we can interchange limit with summation, we obtain from Lemma 6.2 

P ij(0 — J2«kjPik(t) — VjPijit) 

kjtj 

Unfortunately, we cannot always justify the interchange of limit and summation and 
thus the preceding is not always valid. However, they do hold in most models, including 
all birth and death processes and all finite state models. We thus have the following. 
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Theorem 6.2 (Kolmogorov’s Forward Equations) Under suitable regularity con¬ 
ditions, 

P'ij (t) = J2 9kj Pik it) - vj Pij (?) (6.13) 

kjtj 

We shall now solve the forward equations for the pure birth process. For this process, 
Equation (6.13) reduces to 

p!j(t) = k j -iPij-i(t)-kjP i j(t) 

However, by noting that P,j(t) = 0 whenever j < i (since no deaths can occur), we 
can rewrite the preceding equation to obtain 

P' n {t) = -kiPidt), 

(6.14) 

Pij (0 = kj- 1 Pi,j-i(t ) - k j Pij (?), j ^ I + 1 

Proposition 6.4 For a pure birth process, 

P u (t) = e-k\ i> 0 

Pij(t) = kj-\e~ k P [ e XjS Pij-](s) ds, j ^ i + 1 
Jo 

Proof. The fact that P,, (?) = e~ Xit follows from Equation (6.14) by integrating and 
using the fact that P ; , (0) = 1. To prove the corresponding result for P, ; (?), we note by 
Equation (6.14) that 

e Xj ‘ [Plj{t) + kjPij{t)\ = e XjI Xj-iPi j-i(t) 

or 

j [e li ‘Pij(t)] = Xj-ie Xjt Pij-i(t) 

Hence, since P ;/ (0) = 0, we obtain the desired results. ■ 

Example 6.12 (Forward Equations for Birth and Death Process) The forward 
equations (Equation 6.13) for the general birth and death process become 

P/o (t) = y^.gtO-Pit(f) - k 0 P i0 (t) 

kjt 0 

= MiPii(?) — koPio(t) (6.15) 

P'ij (0 = QkjPikit ) - (kj + iij)P l j(t) 

k ¥=j 

= kj—\Pij—\(t ) + p*j+\Pi,j- t-l(0 C kj + A?f) Pij (t) 


( 6 . 16 ) 
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6.5 Limiting Probabilities 

In analogy with a basic result in discrete-time Markov chains, the probability that a 
continuous-time Markov chain will be in state j at time ? often converges to a limiting 
value that is independent of the initial state. That is, if we call this value Pj , then 

Pj = lim Pij(t) 

J ?-»oo 

where we are assuming that the limit exists and is independent of the initial state i. 

To derive a set of equations for the Pj , consider first the set of forward equations 

P'ij (0 = E qkj Pik (0 - Vj Pij (?) (6.17) 

kjtj 

Now, if we let ? approach oo, then assuming that we can interchange limit and summa¬ 
tion, we obtain 


lim P' t At) 

f->0O J 


lim 

t —> oo 


} , qkjPik(t) VjPijif) 
kjtj 


J^qkjPk - vjPj 
k+j 


However, as Pij (?) is a bounded function (being a probability it is always between 0 
and 1), it follows that if / J f (?) converges, then it must converge to 0 (why is this?). 

Hence, we must have 

0 = qkj Pk — Vj Pj 
k^j 


or 

VjPj = ^qkjPk’ all states j (6.181 

The preceding set of equations, along with the equation 

£ p j = * 1 (6.19) 

j 

can be used to solve for the limiting probabilities. 

Remark 

(i) We have assumed that the limiting probabilities Pj exist. A sufficient condition 
for this is that 


(a) all states of the Markov chain communicate in the sense that starting in state 
i there is a positive probability of ever being in state j, for all;, j and 
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(b) the Markov chain is positive recurrent in the sense that, starting in any state, 
the mean time to return to that state is finite 

If conditions (a) and (b) hold, then the limiting probabilities will exist and satisfy 
Equations (6.18) and (6.19). In addition, Pj also will have the interpretation of 
being the long-run proportion of time that the process is in state j. 

(ii) Equations (6.18) and (6.19) have a nice interpretation: In any interval (0, t) the 
number of transitions into state j must equal to within 1 the number of transitions 
out of state j (why?). Hence, in the long run, the rate at which transitions into 
state j occur must equal the rate at which transitions out of state j occur. When 
the process is in state j, it leaves at rate Vj, and, as Pj is the proportion of time it 
is in state j, it thus follows that 

VjPj = rate at which the process leaves state j 

Similarly, when the process is in state k, it enters j at a rate qtj. Hence, as I\ is 
the proportion of time in state k, we see that the rate at which transitions from k 
to j occur is just qkj Pk\ thus 



kj=j 

So, Equation (6.18) is just a statement of the equality of the rates at which the 
process enters and leaves state j. Because it balances (that is, equates) these rates, 
Equation (6. 1 8) is sometimes referred to as a set of “balance equations.” 

(iii) When the limiting probabilities Pj exist, we say that the chain is ergodic. The Pj 
are sometimes called stationary probabilities since it can be shown that (as in the 
discrete-time case) if the initial state is chosen according to the distribution {Pj}, 
then the probability of being in state j at time t is Pj , for all t. 

Let us now determine the limiting probabilities for a birth and death process. From 
Equation (6.18) or equivalently, by equating the rate at which the process leaves a state 
with the rate at which it enters that state, we obtain 


State 

0 


Rate at which leave = rate at which enter 
XqPq — imPi 

(k| + ti\)P\ — jiiPi + ^oPo 
( X.2 + P-l)P2 = P-3P3 + MFl 


2 

n, n ^ 1 


(hn + l-t n ) Pn — Pn +1 +1 h>i—l Pn—\ 


By adding to each equation the equation preceding it, we obtain 


X0P0 = ji\Pi, 
X\P\ — \ 12 P 2 , 
X2P2 = 


hnPn — l^n+l Pn+l , 


n ^ 0 
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Solving in terms of A) yields 


Ao 

A = — A), 

Pi 

Ai Ai An 

Pi = P\ = P 0 , 

Pi PlPl 

A? A 7 A 1 A 0 

P3 = —Pi = A), 

P3 P3P1PI 


D d X n —\X n —2' ''AiAo D 

Pn = A; —1 = A) 

pn Pnpn — l ' ‘ ‘ PlPl 

And by using the fact that ^ ( ^_ 0 P„ = I. we obtain 


1 = A) + A) J2 


n =1 


An-1•••A 1 A 0 
Pn ' ''PlPl 


or 


A) = 


1 

E oo AqAi -t„_i 
n=l fiinr -Hn 


and so 

Pn = 


AqAi•• • A„i 




H ^ 1 


( 6 . 20 ) 


The foregoing equations also show us what condition is necessary for these limiting 
probabilities to exist. Namely, it is necessary that 

V —-— < 00 ( 6 . 21 ) 

PlPl-'-Pn 


This condition also may be shown to be sufficient. 

In the multiserver exponential queueing system (Example 6.6), Condition (6.21) 
reduces to 


> - < 00 

which is equivalent to A /sp < 1. 

For the linear growth model with immigration (Example 6.4), Condition (6.21) 
reduces to 


E 


6>(6> + A)---(6> + (h - 1)A) 
n ! p n 


< 00 
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Using the ratio test, the preceding will converge when 


lim 

n—^oo 


9(9 + !)••• (9 +nX) 
(n + l)!/u" +1 


9(9 + X) ■ ■ ■ (9 + (n - 1)A) 


lim 

n —»oo 


9 + nX 
(n + l)/x 



/x 


That is, the condition is satisfied when X < [i. When X Js /x it is easy to show that 
Condition (6.21) is not satisfied. 


Example 6.13 (A Machine Repair Model) Consider a job shop that consists of M 
machines and one serviceman. Suppose that the amount of time each machine runs 
before breaking down is exponentially distributed with mean 1 /X, and suppose that 
the amount of time that it takes for the serviceman to fix a machine is exponentially 
distributed with mean 1 //i. We shall attempt to answer these questions: (a) What is the 
average number of machines not in use? (b) What proportion of time is each machine 
in use? 


Solution: If we say that the system is in state n whenever n machines are not in 
use, then the preceding is a birth and death process having parameters 


M 


X 


n 


(M - n)X, 

0, 


n ^ 1 
n ^ M 
n > M 


This is so in the sense that a failing machine is regarded as an arrival and a fixed 
machine as a departure. If any machines are broken down, then since the serviceman’s 
rate is [i. /i n = ji. On the other hand, if n machines are not in use, then since the 
M — n machines in use each fail at a rate X, it follows that X n = (M — n)X. From 
Equation (6.20) we have that P n , the probability that n machines will not be in use, 
is given by 


Po = 


1 

1 + Ef=i (MX(M - 1)A. • ■ ■ (M - n + l)X/n n ] 


1 

1 + Ef=, (X/nYM\/(M -n)\ 
(X/n) n M\/(M -n)\ 

1 + E,f=i (X/fX) n M\/(M — n)\ ’ 


n = 0, 1, ..., M 


Hence, the average number of machines not in use is given by 


M 


J2 nP " 

n =0 


Ef= o n(X/n)>'M\/(M-n)\ 

1 + Ef=t (X/n) n M\/(M-n)\ 


( 6 . 22 ) 
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To obtain the long-run proportion of time that a given machine is working we will 
compute the equivalent limiting probability of the machine working. To do so, we 
condition on the number of machines that are not working to obtain 


/’{machine is working} 


M 

Y, P {machine is working \n not working} P n 

n =0 


E 


n =0 


M — n 
M 


Pn 


(since if n are not working, 
then M — n are working!) 


M 

i-E 


0 


n Pn 
M 


where '/Lu nP n is given by Equation (6.22). ■ 

Example 6.14 (The M/M/1 Queue) In the M/M/1 queue X n = X, /z„ = p and thus, 
from Equation (6.20), 




= (A//x)"(l - k//r), 0 

provided that X/ fi < 1. It is intuitive that X must be less than // for limiting probabilities 

to exist. Customers arrive at rate X and are served at rate /r, and thus if X > p, then 
they arrive at a faster rate than they can be served and the queue size will go to infinity. 
The case X — p behaves much like the symmetric random walk of Section 4.3, which 
is null recurrent and thus has no limiting probabilities. ■ 

Example 6.15 Let us reconsider the shoe shine shop of Example 6.1, and determine 
the proportion of time the process is in each of the states 0,1,2. Because this is not a 
birth and death process (since the process can go directly from state 2 to state 0), we 
start with the balance equations for the limiting probabilities. 


Pn = 


i + i wur 


rate that the process enters 
P2P2 
XP 0 
Pi Pi 

Solving in terms of Pq yields 

X X 

P2 — — Po 1 Pi — — P{) 

P2 Pi 


State Rate that the process leaves = 
0 XP Q = 

1 p\ P\ = 

2 P 2 P 2 = 


which implies, since P 0 + P l + P 2 = 1, that 


A) 



P2 Pl 
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or 


Po = 


_MlM2_ 

M 1 M 2 +Mmi + M 2 ) 


and 


Pi 


Pi 


^M2 

M1M2 + MM1 +M2)’ 
_ ami _ 

M1M2 + Mmi + M2) 


Example 6.16 Consider a set of n components along with a single repairman. Suppose 
that component i functions for an exponentially distributed time with rate k; and then 
fails. The time it then takes to repair component i is exponential with rate /i,, i = 

1 . n. Suppose that when there is more than one failed component the repairman 

always works on the most recent failure. For instance, if there are at present two failed 
components—say, components 1 and 2 of which 1 has failed most recently—then the 
repairman will be working on component 1. However, if component 3 should fail before 
1 ’s repair is completed, then the repairman would stop working on component 1 and 
switch to component 3 (that is, a newly failed component preempts service). 

To analyze the preceding as a continuous-time Markov chain, the state must represent 
the set of failed components in the order of failure. That is, the state will be i i, 4,..., 4 
if zi, (' 2 , ... ,4 are the k failed components (all the other n — k being functional) with 
i\ having been the most recent failure (and is thus presently being repaired), h the 
second most recent, and so on. Because there are k\ possible orderings for a fixed set 
of k failed components and ( n ,) choices of that set, it follows that there are 


n n , n i 

Y( n ) k i = Y^i_ = n iY±_ 
^ \kJ (n - k)\ ^ i\ 

k=0 k =0 i'=0 


possible states. 

The balance equations for the limiting probabilities are as follows: 


Mu 


E 


i^ij 

7 = 1 ,... 


P(i l, ■ • •, ik) 


E n-- ik)/ki + P(h, ■■■, 4)^11, 

i 7 ^ij 
7=1 . k 


Y^p (<t>)= E p(, >' 

i=i i=t 


(6.23) 


where <p is the state when all components are working. The preceding equations follow 
because state i\ ,..., i k can be left either by a failure of any of the additional components 
or by a repair completion of component i \. Also, that state can be entered either by 
a repair completion of component i when the state is i,i\,..., ik or by a failure of 
component /] when the state is 12 , ..., 4- 










380 


Introduction to Probability Models 


However, if we take 


P(U,...,i k )= (6.24) 

Pi\Pi2 ' ' ' Pik 

then it is easily seen that Equations (6.23) are satisfied. Hence, by uniqueness these 
must be the limiting probabilities with P(<p) determined to make their sum equal 1. 
That is, 




-1 


E 

*i.** 


^■h '' ' 

Pi] ' ' ' pik 


As an illustration, suppose n = 2 and so there are five states <p, 1, 2, 12, 21. Then from 
the preceding we would have 


P(<P) = 
P( 1) = 
P( 2) = 


X\ 

1 + — - 
p 1 
X\ 

—pm, 

pi 
X 2 

—pm. 


P 2 

P{ 1,2) = P( 2, 1) = 


A. 2 2A1A2 1 

P2 P1P2 . 


XlX2 

P1P2 


pm 


It is interesting to note, using Equation (6.24), that given the set of failed components, 
each of the possible orderings of these components is equally likely. ■ 


6.6 Time Reversibility 

Consider a continuous-time Markov chain that is ergodic and let us consider the limiting 
probabilities P, from a different point of view than previously. If we consider the 
sequence of states visited, ignoring the amount of time spent in each state during a visit, 
then this sequence constitutes a discrete-time Markov chain with transition probabilities 
Pij- Let us assume that this discrete-time Markov chain, called the embedded chain, 
is ergodic and denote by 1 r,- its limiting probabilities. That is, the tt, are the unique 
solution of 

TTj = n jPji , a H ; 
j 

£*«■ = 1 
i 

Now, since 7 r,- represents the proportion of transitions that take the process into state 
i, and because l/i>; is the mean time spent in state i during a visit, it seems intuitive 
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that P,, the proportion of time in state i, should be a weighted average of the n ,• where 
Ttj is weighted proportionately to 1/d, . That is, it is intuitive that 


m/vj 

EjXj/vj 


(6.25) 


To check the preceding, recall that the limiting probabilities P, must satisfy 


Vi P, = ^2 p jq.ji’ a11 i 


or equivalently, since P,-, = 0 


v i Pi = ^ Pj vj Pji , all i 

j 


Hence, for the P, s to be given by Equation (6.25), the following would be necessary: 

iti — ^2 TTjPji, all i 
j 

But this, of course, follows since it is in fact the defining equation for the 7r,-s. 

Suppose now that the continuous-time Markov chain has been in operation for a 
long time, and suppose that starting at some (large) time T we trace the process going 
backward in time. To determine the probability structure of this reversed process, we 
first note that given we are in state i at some time—say, t —the probability that we have 
been in this state for an amount of time greater than s is just e~ v ‘ s . This is so, since 


Pjprocess is in state i throughout [t — s, t]\X{t) = i) 
Pjprocess is in state i throughout [t — s, f]} 

= P[X(t) = i] 

P{X(t - s) = i}e~ ViS 
P{X(t) = i } 


since for t large P{X(t — s) = i] = P{X(t) — i } = P,-. 

In other words, going backward in time, the amount of time the process spends in 
state i is also exponentially distributed with rate v ,•. In addition, as was shown in Section 
4.8, the sequence of states visited by the reversed process constitutes a discrete-time 
Markov chain with transition probabilities Q,j given by 


Hence, we see from the preceding that the reversed process is a continuous-time Markov 
chain with the same transition rates as the forward-time process and with one-stage 
transition probabilities Qij. Therefore, the continuous-time Markov chain will be time 
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reversible, in the sense that the process reversed in time has the same probabilistic 
structure as the original process, if the embedded chain is time reversible. That is, if 

7 TjPjj = Ttj Pji , for all i, j 

Now, using the fact that P, = (7T//u, )/( Ttj/vj), we see that the preceding condition 
is equivalent to 

Ptqij = Pjqji, for all i, j (6.26) 

Since P, is the proportion of time in state i and q, j is the rate when in state i that the 
process goes to j, the condition of time reversibility is that the rate at which the process 
goes directly from state i to state j is equal to the rate at which it goes directly from 
j to i. It should be noted that this is exactly the same condition needed for an ergodic 
discrete-time Markov chain to be time reversible (see Section 4.8). 

An application of the preceding condition for time reversibility yields the following 
proposition concerning birth and death processes. 

Proposition 6.5 An ergodic birth and death process is time reversible. 

Proof. We must show that the rate at which a birth and death process goes from state 
i to state i + 1 is equal to the rate at which it goes from i + 1 to i. In any length of 
time t the number of transitions from i to i + I must equal to within 1 the number from 
i + 1 to i (since between each transition from i to i + I the process must return to i, 
and this can only occur through i + 1, and vice versa). Hence, as the number of such 
transitions goes to infinity as t —> oo, it follows that the rate of transitions from i to 
i + 1 equals the rate from i + 1 to i. ■ 

Proposition 6.5 can be used to prove the important result that the output process of 
an M/M/s queue is a Poisson process. We state this as a corollary. 

Corollary 6.6 Consider an M/M/s queue in which customers arrive in accordance 
with a Poisson process having rate X and are served by any one of s servers—each 
having an exponentially distributed service time with rate p. If k < sp, then the output 
process of customers departing is, after the process has been in operation for a long 
time, a Poisson process with rate X. 

Proof. Let X{t) denote the number of customers in the system at time t. Since the 
M/M/s process is a birth and death process, it follows from Proposition 6.5 that 
{X(t), t ft 0} is time reversible. Going forward in time, the time points at which X(t) 
increases by 1 constitute a Poisson process since these are just the arrival times of 
customers. Hence, by time reversibility the time points at which X(t ) increases by 
1 when we go backward in time also constitute a Poisson process. But these latter 
points are exactly the points of time when customers depart (see Figure 6.1). Hence, 
the departure times constitute a Poisson process with rate X. ■ 

Example 6.17 Consider a first come first serve M/M/1 queue, with arrival rate X 
and service rate p, where X < p, that is in steady state. Given that customer C spends 
a total of t time units in the system, what is the conditional distribution of the number 
of others that were present when C arrived? 
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X(f) 


-*-x-x—*- t 

x= times at which going backward in time, X(t) increases 
= times at which going forward in time, X(t) decreases 

Figure 6.1 The number in the system. 


Solution: Suppose that C arrived at time .s' and departed at time s +1. Because the 
system is first come first served, the number that were in the system when C arrived 
is equal to the number of departures of other customers that occur after time s and 
before time s + t, which is equal to the number of arrivals in the reversed process in 
that interval of time. Now, in the reversed process C would have arrived at time s +1 
and departed at time .v. Because the reversed process is also an M/M/1 queueing 
system, the number of arrivals during that interval of length t is Poisson distributed 
with mean ),t. (For a more direct argument for this result, see Section 8.3.1.) ■ 

We have shown that a process is time reversible if and only if 

Piqij = Pjqji for all f ^ j 

Analogous to the result for discrete-time Markov chains, if we can find a probability 
vector P that satisfies the preceding then the Markov chain is time reversible and the 
P, s are the long-run probabilities. That is, we have the following proposition. 

Proposition 6.7 If for some set { //} 

= ’’ P i>° 

i 

and 

Piqij = Pjqji for all i ^ j (6.27) 

then the continuous-time Markov chain is time reversible and P, represents the limiting 
probability of being in state i. 

Proof. For fixed i we obtain upon summing Equation (6.27) over all j : j ^ i 
PiQij — p jQji 

or, since J2j^i <lij = v i> 

V ‘ P ‘ = ^2 P .i C lji 
i¥=i 
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Hence, the P, s satisfy the balance equations and thus represent the limiting probabilities. 
Because Equation (6.27) holds, the chain is time reversible. ■ 

Example 6.18 Consider a set of n machines and a single repair facility to service them. 
Suppose that when machine i, i = 1goes down it requires an exponentially 
distributed amount of work with rate /a, to get it back up. The repair facility divides its 
efforts equally among all down components in the sense that whenever there are k down 
machines 1 ^ k ^ n each receives work at a rate of 1 /k per unit time. Finally, suppose 
that each time machine i goes back up it remains up for an exponentially distributed 
time with rate A,-. 

The preceding can be analyzed as a continuous-time Markov chain having 2" states 
where the state at any time corresponds to the set of machines that are down at that 
time. Thus, for instance, the state will be (4, H, ..., 4-) when machines i\,... , ik are 
down and all the others are up. The instantaneous transition rates are as follows: 

< Z(ii,.-,4-i),(«i,...,/*) = ^4’ 

< 7((i,...,4)>(4.4— i ) = Mk/k 

where i\, ... ,4 are all distinct. This follows since the failure rate of machine 4 is 
always Xi k and the repair rate of machine ik when there are k failed machines is jJii k /k. 
Hence, the time reversible equations from (6.27) are 

P(i\ -- ik)M k /k = P(i i, ..., ik- 1)^4 

or 

P(h,...,i k ) = 

M4 

kXi k {k 1)7./£_j . 

=- P(i i,..., ik- 2 ) upon iterating 

P'ik P'ik-l 


k 

= k\ f[ frij/lHj)Pto) 

7=1 

where cf> is the state in which all components are working. Because 


P(0) + ^P(4,...,4-) = 1 


we see that 

P<,<t>) = 


1+ E nYltoj/Hj) 

4, ...,4 7=1 


(6.28) 


where the preceding sum is over all the 2" — 1 nonempty subsets {4, ..., 4} of 
{1,2Hence, as the time reversible equations are satisfied for this choice of 
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probability vector it follows from Proposition 6.7 that the chain is time reversible and 
k 

7=1 


with P(cf>) being given by (6.28). 

For instance, suppose there are two machines. Then, from the preceding we would 
have 


P(<t>) = 


P( 1) = 


P{ 2) = 


P(l,2) = 


1 

1 + X\/jl\ + A. 2 //X 2 + Tk\kll !X\IL2 

A1//X1 

1 + X1 /yU, 1 + A 2 /M 2 + 2A-1 A. 2 /M 1 /X 2 

^2/p2 

1 + X1//U.1 + I2/M2 + 2A1A2//U1/X2 
2 Xi 7-2 

M1 P '2 [ 1 + M/Ml + ^2/M 2 + 2 A. 1 A.2/M1 M2] 


Consider a continuous-time Markov chain whose state space is S. We say that the 
Markov chain is truncated to the set A C S if qij is changed to 0 for all i e A, j £ A. 
That is, transitions out of the class A are no longer allowed, whereas ones in A continue 
at the same rates as before. A useful result is that if the chain is time reversible, then 
so is the truncated one. 


Proposition 6.8 A time reversible chain with limiting probabilities Pj, j e S that 
is truncated to the set A C S and remains irreducible is also time reversible and has 
limiting probabilities P j 4 given by 


P A = Pj 

j 


j e A 


Proof. By Proposition 6.7 we need to show that, with P A as given, 

Pf'qij = Pfqji fori e A, j e A 
or, equivalently, 


Pi qij = Pjqji for i e A, j e A 

But this follows since the original chain is, by assumption, time reversible. ■ 

Example 6.19 Consider an M/M/ 1 queue in which arrivals finding N in the system 
do not enter. This finite capacity system can be regarded as a truncation of the M/M/1 
queue to the set of states A = {0, 1, ..., N}. Since the number in the system in the 
M/M/1 queue is time reversible and has limiting probabilities Pj = (),/ii)U\ —'k/ji) 
it follows from Proposition 6.8 that the finite capacity model is also time reversible and 
has limiting probabilities given by 

„ O^/p) 1 


■N 


7 = 0,1,..., TV 
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Another useful result is given by the following proposition, whose proof is left as 
an exercise. 

Proposition 6.9 If {A;(f), t ^ 0} are, for i = l,... ,n, independent time reversible 
continuous-time Markov chains, then the vector process {(X;(f), ..., X n (r)), t 0} 
is also a time reversible continuous-time Markov chain. 

Example 6.20 Consider an n -component system where component i, i = l, ... ,n, 
functions for an exponential time with rate k, and then fails; upon failure, repair begins 
on component i, with the repair taking an exponentially distributed time with rate m;. 
Once repaired, a component is as good as new. The components act independently 
except that when there is only one working component the system is temporarily shut 
down until a repair has been completed; it then starts up again with two working 
components. 

(a) What proportion of time is the system shut down? 

(b) What is the (limiting) averaging number of components that are being repaired? 

Solution: Consider first the system without the restriction that it is shut down when 
a single component is working. Letting A, (t), i = 1,... ,n, equal 1 if component i is 
working at time t and 0 if it failed, then { X, (t), t ^ 0} , i = 1 ,..., n, are independent 
birth and death processes. Because a birth and death process is time reversible, it 
follows from Proposition 6.9 that the process {(Xi(f),..., X n (t)). t ^ 0} is also 
time reversible. Now, with 


Pdj) = lim P\ x i(t) = j = o, l 

1 —>oo 


we have 


Pi( 1 ) 


Mi 

Mi + k; 


Pi( 0) 


h 

Mi + k; 


Also, with 


P(ju ■■■, jn) = lim P{Xj(t) = ji, i = 1,..., n] 

t—> OO 

it follows, by independence, that 

n 

P Ol, • ■ • > jn) = Y\ P ‘Ui), ji = 0,1, 1 = 1. n 

i=l 

Now, note that shutting down the system when only one component is working is 
equivalent to truncating the preceding unconstrained system to the set consisting of 
all states except the one having all components down. Therefore, with Pt denoting 
a probability for the truncated system, we have from Proposition 6.8 that 


Priji , ■■■, jn) 


P Ol, ■ • ■ , jn) 


Y, P > 0 

i=l 


1 - c 
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where 


C= P(0,...,0) = l\kj/( N + lj) 
j=1 

Hence, letting (0, 1, ) = (0,..., 0, 1, 0,..., 0) be the n vector of zeroes and ones 
whose single 1 is in the ith place, we have 


n 

Pt (system is shut down) = E p r(0, 1,) 

i=t 


1 

1 - C 


E 

1 = 1 


Mi 


Mi + A; 


n 


A; 


Pj + x j 


c E"=i Mi Ai 
l - c 


Let 7? denote the number of components being repaired. Then with 7, equal to 1 
if component i is being repaired and 0 otherwise, we have for the unconstrained 
(nontruncated) system that 


£[/?] = E 


E 7 ' 


E^(0) = E X '/(Mi+A ! ) 

i=t /=! 


But, in addition, 

£[7?] = 7j[7?|all components are in repair]C 

+7i[77|not all components are in repair](l — C) 
= nC + E T [R](1 - C) 


implying that 


£ r (7?] = 


Z ”=1 A,/(Mi + A.,0 - nC 
1 - C 


6.7 The Reversed Chain 

Consider an ergodic continuous-time Markov chain whose state space is S and which 
has instantaneous transition rates q,/ and limiting probabilities P,, i e S, and suppose 
that this chain that has been in operation for a long (in theory, an infinite) time. Then, it 
follows from results in the previous section that the process of states going backwards 
in time is also a continuous time Markov chain, having instantaneous transition rates 
q*. that satisfy 


Pi Qij — PjQji’ * 7^ j 
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The reverse chain is a very useful concept even in cases where it differs from the forward 
chain (that is, even in cases where the chain is not time reversible). 

To begin, note that the amount of time the reverse chain spends in state i during a 
visit is exponential with rate vf = 22 j^i <?,*•■ Because the amount of time the process 
spends in a state i during a visit will be the same whether the chain is observed in the 
usual (forward) or in the reverse direction of time, it follows that the distribution of 
the time that the reverse chain spends in state i during a visit should be the same as the 
distribution of the time that the forward chain spends in that state during a visit. That 
is, we should have that 

Vi = Vi 

Moreover, because the proportion of time that the chain spends in state i would be 
the same whether one was observing the chain in the usual (forward) direction of time 
or in the reverse direction, the two chains should intuitively have the same limiting 
probabilities. 

Proposition 6.10 Let a continuous-time Markov chain have instantaneous transition 
rates qij and limiting probabilities P,, i e S, and let q*. be the instantaneous rates of 
the reversed chain. Then, with v* = 22 j^ti q*j and Vj = 22/^i qij 

Vj = Vi 

Moreover P,, i e S, are also the limiting probabilities of the reversed chain. 

Proof. Using that P, q*. = Pjqji we see that 


E«y = E p mil p i = v p i ! p i = vi 

jfr J# 

where the preceding used (from (6.18)) that 22jM Pjqji = v iPi- 

That the reversed chain has the same limiting probabilities as does the forward 
chain can be formally proven by showing that the P, satisfy the balance equations of 
the reversed chain: 

v*jPj = E p wtr 

k^j 

Now, because v* = vj and Pkq\j — Pjqjk• the preceding equations are equivalent to 

v j Pj = Pjqjk > j £ s 

k^j 


which are just the balance equations for the forward chain, which are known to be 
satisfied by the Pj. ■ 
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That the long-run proportions for the reverse chain are the same as for the forward 
chain makes it easy to understand why 

Piyij — PjQjit i 7 ^ j 

Because P, is the proportion of time the reverse chain spends in state i and qf- is 
the rate, when in i, that it makes a transition into state j, it follows that P, qf- is the 
rate at which the reversed chain makes transitions from i to j. Similarly, Pjqji is the 
rate at which the forward chain makes transitions from j to i. Because every transition 
from j to i in the (forward) Markov chain would be seen as a transition from i to j by 
someone looking backwards in time, it is evident that P, qf- = Pjqji- 

The following proposition shows that if one can find a solution of the “reverse chain 
equations” then the solution is unique and yields the limiting probabilities. 

Proposition 6.11 Let q,j be the transition rates of an irreducible continuous time 
Markov chain. If one can find values qf- and a collection of positive values P, that sum 


to 1, such that 



Pi qfj = Pjqji, i 

'^j 

(6.29) 

and 




i G S 

(6.30) 


then qfj are the transition rates of the reversed chain and P, are the limiting probabilities 
(for both chains). 

Proof. We show that the P, are the limiting probabilities by showing that they satisfy 
the balance Equations (6.18). To show this, sum (6.29) over all j, j ^ i, to obtain 

Pi 22 q t j — 22 PjiJ' ’ ‘ e S 

Using (6.30) now shows that 

Pi qij — 22 PjVji 
j¥=i j¥=i 

Because JT Pi = 1 we see that the P, satisfy the balance equations and are thus the 
limiting probabilities. Because P,fl* = P.o,,- it also follows that qf. are the transition 
rates of the reversed chain. ■ 

Suppose now that the structure of the continuous time Markov chain enables us 
to make a guess as to the transition rates of the reversed chain. Assuming that this 
guess satisfies Equation (6.30) of Proposition 6.1 1, we can then verify the correctness 
of our guess by seeing whether there are probabilities that satisfy Equations (6.29). If 
there are such probabilities, our guess is correct and we have also found the limiting 
probabilities; if there are not, our guess is incorrect. 
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Example 6.21 Consider a continuous-time Markov chain whose states are the non¬ 
negative integers. Suppose that a transition out of state 0 goes to state i with probability 
a;, a; = 1; whereas a transition out of state i > 0 always goes to state i — 1. 
That is, the instantaneous transition rates of this chain are, for i > 0 


qoi = vq «/ 

< 7;,;-1 = Vi 

Let Wbea random variable having the distribution of the next state from state 0; that 
is, P(N = i) = at, i > 0. Also, say that a cycle begins each time the chain goes to 
state 0. Because the forward chain goes from 0 to A and then continually moves one 
step closer to 0 until reaching that state, it follows that the states in the reverse chain 
would continually increase by 1 until it reaches N at which point it goes back to state 
0 (see Figure 6.2). 


forward transitions : iV —> N — 1 2 —» 1 —>■ 0 

reverse transitions : 0 -»• 1 — >2—N - l ^ N 

Figure 6.2 Forward and Reverse Transitions. 

Now, if the chain is currently in state i then the value of N for that cycle must be at 
least i . Hence, the next state of the reversed chain will be 0 with probability 


P(N = i\N ^ i) = 


P(N = i ) 


P(N > i) P(N > i) 
and will be i + 1 with probability 

1 - P(N = i\N ^ i ) = P(N ^ i + 1|A Si i) = 


P(N Si t + 1) 
P(N S* i) 


Because the reversed chain spends the same time in a state during each visit as does 
the forward chain, it thus appears that the transition rates of the reversed chain are 


quo = v < 


?/,.•+1 


P(N^i)’ 
P(N Ss / + 1) 
P(N S* 0 


i > 0 
i S? 0 


Based on the preceding guess, the reversed time equations Po <?o; = P/q*o an d Piqi.i -1 = 

P,-iq*_ l . become 


Po vo oh = Pi Vi 


and 


Pi Vi = Pi- 1 Vj— 


P(N^i)’ 
P(N ^ i) 


i ^ 1 


' 1 P(N Si / - 1)’ 


i > 1 


(6.31) 


(6.32) 
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The set of Equations (6.31) gives 
Pi = P 0 v 0 P(N ^ i)/vi, i ^ 1 

As the preceding equation is also valid when i — 0 (since P(N 0) = I ), we obtain 
upon summing over all i that 

OO 

1 = J2 Pi = P» vo J2 P(N > *)/«,• 

; i=0 


Thus, 


P(N ^ i)/Vj 

E,“o P(N > i)/vi ’ 


i > 0 


To show that the set of Equations (6.32) is also satisfied by the preceding values of /), 
note that, with C — 1/ Yl?Lo B(/V ^ i)/vi. 


Vi Pi _ _ Vi \Pj l 

P(N ^i)~ ~ P(N > i - 1) 

which immediately shows that Equations (6.32) are also satisfied. Because we chose 
the transition rates of the reversed chain to be such that it spent as much time in state i 
during a visit as does the forward chain, there is no need to check Condition (6.30) of 
Proposition 6.11, and so the stationary probabilities are as given. ■ 

Example 6.22 (A Sequential Queueing System) Consider a two-server queueing 
system in which customers arrive at server 1 in accordance with a Poisson process with 
rate X. An arrival at server 1 either enters service if server 1 is free or joins the queue 
if server 1 is busy. After completion of service at server 1 the customer moves over to 
server 2, where it either enters service if server 2 is free or join its queue otherwise. 
After completion of service at server 2 a customer departs the system. The service 
times at servers 1 and 2 are exponential with rates n\ and /ij respectively. All service 
times are independent and are also independent of the arrival process. 

The preceding model can be analyzed as a continuous-time Markov chain whose 
state is ( n , m ) if there are currently n customers with server 1 and m with server 2. The 
instantaneous transition rates of this chain are 


= k, ll > 0 

Q(n+l,m— = Ml’ VI > 0 

< l(n,m+\),{n,m ) = M2 

To find the limiting probabilities, let us first consider the chain going backwards in 
time. Because in real time the total number in the system decreases at moments when 
customers depart server 2, looking backwards the total number in the system will at 
those moments increase by having an added customer at server 2. Similarly while in 
real time the number will increase when a customer arrives at server 1, in the reverse 






392 


Introduction to Probability Models 


process at that moment there will be a decrease in the number at server 1. Because 
the times spent in service at server i will be the same whether looking in forward or 
in reverse time, it appears that the reverse process is a two-server system in which 
customers arrive first at server 2, then go to server 1, and then depart the system, with 
their service times at server i being exponential with rate //,, i = 1, 2. Now the arrival 
rate to server 2 in the reverse process is equal to the departure rate from the system in 
the forward process and this must equal the arrival rate X of the forward process. (If the 
departure rate of the forward process was less than the arrival rate, then the queue size 
would build to infinity and there would not be any limiting probabilities.) Although it 
is not clear that the arrival process of customers to server 2 in the reverse process is 
a Poisson process, let us guess that this is the case and then use Proposition 6.11 to 
determine whether our guess is correct. 

So, let us guess that the reverse process is a sequential queue where customers arrive 
at server 2 according to a Poisson process with rate X, and after receiving service at 
server 2 move over to server 1, and after receiving service at server 1 depart the system. 
In addition, the service times at server i are exponential with rate /x,-, i = 1,2. Now, 
if this were true then the transition rates of the reverse chain would be 

= ^1’ H >0 
“?(/!,m),(«+l.m— 1) ^2> m > 0 

9(n,m),(n,m+1) — A 

The rate at which a chain with transition rates q* departs from state (n, m) is 

(n-l, m )+9(n,ffl),(n + l,m-l)+9(*i,m),(n,m + l) = P-l-H” > 0} + /r 2 /{m > 0}+k 

where I{k > 0} is equal to 1 when k > 0 and is equal to 0 otherwise. As the preceding 
is also the rate at which the forward process departs from state (n. m ), the Condition 
(6.30) of Proposition 6.11 is satisfied. 

Using the preceding conjectured reverse time transition rates, the reverse time equa¬ 
tions would be 

Pn — l,m k = Pn,m Ml’ U > 0 (6.33) 

Pn+\,m —1 Ml = Pn,m M2’ ™ > 0 (6.34) 

Pn,m +1 M2 = Pfl jn X (6.35) 

Writing (6.33) as P n m = (k/Mi) Pn-i,m an d iterating, yields that 

Pn,m = (k/Ml ) Pn—2,m = • ■ • = ( X /Ml) Po,m 

Letting n = 0, m = m — 1 in Equation (6.35) shows that Po jn = (X/ M 2 )Po.m-i’ which 
yields upon iteration that 


Po,m = (^/M2)“ Po,m-2 = ■ ■ ■ = ( X /m 2 )'” Pq,0 
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Hence, the conjectured reversed time equations imply that 

Pn,m = (V/U)" (V/^)" PO.O 

Using that Em p n.m = 1, gives 

Pn,m = (A/Ml )" (1 - VAN) (1 - A./^) 

As it is easy to check that all the conjectured reverse time Equations (6.33), (6.34), and 
(6.35) are satisfied for the preceding choice of P nm , it follows that they are the limiting 
probabilities. Hence, we have shown that in steady state the numbers of customers at 
the two servers are independent, with the number at server i distributed as the number 
in the system of an M/M /1 queue with Poisson arrival rate '/, and exponential service 
rate /x,, i = 1, 2. (See Example 6.14.) 


6.8 Uniformization 


Consider a continuous-time Markov chain in which the mean time spent in a state is 
the same for all states. That is, suppose that Vj = v, for all states i. In this case since 
the amount of time spent in each state during a visit is exponentially distributed with 
rate v, it follows that if we let N(t) denote the number of state transitions by time t, 
then {N(t),t ^ 0} will be a Poisson process with rate v. 

To compute the transition probabilities Pij it), we can condition on N(t): 

Pij(t ) = P{X(t) = j\X(0) = i } 

oo 

= = j|X(0) = *. N A = n}P{N{t) = n\X(0) = i} 

n =0 

°° (vt) n 

= J2 P{X{t ) = j | X (0) = i, N (/) = n}e- v,K -— 
z —' n 

n=0 

Now, the fact that there have been n transitions by time t tells us something about the 
amount of time spent in each of the first n states visited, but since the distribution of 
time spent in each state is the same for all states, it follows that knowing that N{t) — n 
gives us no information about which states were visited. Hence, 


P{X(t) = j\X(0) = i, N(t) = n} = P"j 

where P" is just the n -stage transition probability associated with the discrete-time 

./ 

Markov chain with transition probabilities Pij ; and so when u,- = v 


P lj(t ) = J2 p" 


-,Avtr 


n =0 


(6.36) 


Equation (6.36) is often useful from a computational point of view since it enables us 
to approximate P\j (t) by taking a partial sum and then computing (by matrix multipli¬ 
cation of the transition probability matrix) the relevant n stage probabilities P". 
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Whereas the applicability of Equation (6.36) would appear to be quite limited since 
it supposes that v, = v, it turns out that most Markov chains can be put in that form by 
the trick of allowing fictitious transitions from a state to itself. To see how this works, 
consider any Markov chain for which the Vj are bounded, and let v be any number such 
that 

Vi ^ v, for all i (6.37) 

When in state i, the process actually leaves at rate v/ ; but this is equivalent to supposing 
that transitions occur at rate v, but only the fraction v,/v of transitions are real ones (and 
thus real transitions occur at rate i;,- ) and the remaining fraction 1 — Vj/v are fictitious 
transitions that leave the process in state i. In other words, any Markov chain satisfying 
Condition (6.37) can be thought of as being a process that spends an exponential amount 
of time with rate v in state i and then makes a transition to j with probability P *, where 



(6.38) 


Hence, from Equation (6.36) we have that the transition probabilities can be computed 
by 



where P*. are the «-stage transition probabilities corresponding to Equation (6.38). 
This technique of uniformizing the rate in which a transition occurs from each state by 
introducing transitions from a state to itself is known as unifonnization. 

Example 6.23 Let us reconsider Example 6.11, which models the workings of a 
machine—either on or off—as a two-state continuous-time Markov chain with 


Poi = P to = 1> 


i>o = X, v\ = n 

Letting v = /, + /i, the uniformized version of the preceding is to consider it a 
continuous-time Markov chain with 

Poo = 7 ^— = 1 - Pou 

A + /X 


Vi = X + /x, i = 1,2 

As Poo = Pio, it follows that the probability of a transition into state 0 is equal to 
li/(X + ii) no matter what the present state. Because a similar result is true for state 1, 
it follows that the /(-stage transition probabilities are given by 



n ^ 1, i = 0, 1 
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Hence, 

Pm(t) 


00 e 


n =0 


= e -( A +/w 


P 


E 

n= 1 


A /x AT - pt 


n\ 


(") 


\k + fij 



A. + /x 




Similarly, 


p,i(0= 


n=0 


n -(X+n)t t(^ + A00" 
n\ 


— e ~ _ e ~ (A+/i.)f j 


A + /X 


A. 


_|_ ^ e ~ 


A + /x A -J- fi 
The remaining probabilities are 


Poiit) = 1 - Poo a) = t4-[1 - e- a+,l)t l 
A + /x 

Pio(r) = 1 - Pn(t) = - e~ {l+ ^] 

A + /x 


Example 6.24 Consider the two-state chain of Example 6.23 and suppose that the 
initial state is state 0. Let 0(0 denote the total amount of time that the process is in state 
0 during the interval (0, t). The random variable 0(0 is often called the occupation 
time. We will now compute its mean. 

If we let 


m = 


1, ifX(s) = 0 
0, ifX(j) = l 


then we can represent the occupation time by 
O(r) = [ I(s ) ds 
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Taking expectations and using the fact that we can take the expectation inside the 
integral sign (since an integral is basically a sum), we obtain 


E[0{t)] = 


f 

f 

f 


^[/(j)] ds 
P{X(s) = 0} ds 
PooO) ds 




t + 


A 


{1 - e -(*+/*)'} 


X + fi (A, + /r)" 
where the final equality follows by integrating 


-PooO) = 


M 


X 


-e 


—(k+n)s 


A + /x A T /(x 
(For another derivation of E[0(t)], see Exercise 45.) 


6.9 Computing the Transition Probabilities 

For any pair of states i and j, let 

_ Qij ’ if * 7 ^ j 

,J ~ -Vi, if i — j 

Using this notation, we can rewrite the Kolmogorov backward equations 

p !j(0 = J2^Pkj(t)- v iPij(t) 

kjii 

and the forward equations 

P ij (t) == 'y ' <jkj Pjk (t ) — v jP‘j 0) 
k+j 

as follows: 

P[j (t) = y jl: r ik P kj (t) (backward) 

P'ij ( t ) = y k r k j P ik it) (forward) 


This representation is especially revealing when we introduce matrix notation. Define 
the matrices R and P(f), P r (t) by letting the element in row i, column j of these 
matrices be, respectively, rij , I ) ; (1), and PX ( t ). Since the backward equations say that 
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the element in row i, column j of the matrix P'(f) can be obtained by multiplying the 
/th row of the matrix R by the /th column of the matrix P(t), it is equivalent to the 
matrix equation 

P'(t) = RP(t) (6.39) 

Similarly, the forward equations can be written as 

P'(f) = P(f)R (6.40) 

Now, just as the solution of the scalar differential equation 

/'(*) = cf(t ) 

(or, equivalent, f'(t) = f(t)c) is 
f(t) = /( 0)e ct 

it can be shown that the solution of the matrix differential Equations (6.39) and (6.40) 
is given by 

P(f) = P(0)e Rr 

Since P(0) = I (the identity matrix), this yields that 

P(f) = e Rl (6.41) 

where the matrix e Rt is defined by 

°° f n 

e Rt = V R" — (6.42) 

' n\ 
n= 0 

with R" being the (matrix) multiplication of R by itself n times. 

The direct use of Equation (6.42) to compute P (t) turns out to be very inefficient 
for two reasons. First, since the matrix R contains both positive and negative elements 
(remember the off-diagonal elements are the q ,■ ; while the i th diagonal element is — v ,), 
there is the problem of computer round-off error when we compute the powers of R. 
Second, we often have to compute many of the terms in the infinite sum (6.42) to arrive 
at a good approximation. However, there are certain indirect ways that we can utilize 
the relation in (6.41) to efficiently approximate the matrix P(f). We now present two 
of these methods. 

Approximation Method 1 Rather than using Equation (6.42) to compute e Rl , we 
can use the matrix equivalent of the identity 

e* = lim (l + 

n—too \ n/ 

which states that 

t 

I + R- 

n 
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Thus, if we let n be a power of 2, say, n = 2 , then we can approximate P(f) by raising 
the matrix M = I + Rt/n to the nth power, which can be accomplished by k matrix 
multiplications (by first multiplying M by itself to obtain M 2 and then multiplying 
that by itself to obtain M 4 and so on). In addition, since only the diagonal elements of 
R are negative (and the diagonal elements of the identity matrix I are equal to 1), by 
choosing n large enough we can guarantee that the matrix I + R t/n has all nonnegative 
elements. 

Approximation Method 2 A second approach to approximating e R ' uses the 
identity 


e~ Rt = lim 11 - R- 


t 


I - R 


for n large 


and thus 


P (t) = e Rt » I I - R- 


I- R 


-l 


Hence, if we again choose n to be a large power of 2, say, n = 2 k , we can approximate 
P(f) by first computing the inverse of the matrix I — Rt/n and then raising that matrix 
to the nth power (by utilizing k matrix multiplications). It can be shown that the matrix 
(I — Rt/n) -1 will have only nonnegative elements. 

Remark Both of the preceding computational approaches for approximating P(r) 
have probabilistic interpretations (see Exercises 41 and 42). 


Exercises 

1. A population of organisms consists of both male and female members. In a small 
colony any particular male is likely to mate with any particular female in any 
time interval of length h, with probability Xh + o(li). Each mating immediately 
produces one offspring, equally likely to be male or female. Let N\ ( t ) and Nj(t) 
denote the number of males and females in the population at t. Derive the param¬ 
eters of the continuous-time Markov chain {Ah(f), N 2 O)}, i.e., the ?,>;, P , ; of 
Section 6.2. 

*2. Suppose that a one-celled organism can be in one of two states—either A or B. An 
individual in state A will change to state B at an exponential rate a ; an individual 
in state B divides into two new individuals of type A at an exponential rate f J >. 
Define an appropriate continuous-time Markov chain for a population of such 
organisms and determine the appropriate parameters for this model. 
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3. Consider two machines that are maintained by a single repairman. Machine i 
functions for an exponential time with rate /x, before breaking down, i = 1,2. 
The repair times (for either machine) are exponential with rate /x. Can we analyze 
this as a birth and death process? If so, what are the parameters? If not, how can 
we analyze it? 

*4. Potential customers arrive at a single-server station in accordance with a Poisson 
process with rate A. However, if the arrival finds n customers already in the station, 
then he will enter the system with probability a n . Assuming an exponential service 
rate //, set this up as a birth and death process and determine the birth and death 
rates. 

5. There are N individuals in a population, some of whom have a certain infection 
that spreads as follows. Contacts between two members of this population occur 
in accordance with a Poisson process having rate A. When a contact occurs, it is 
equally likely to involve any of the (^) pairs of individuals in the population. If a 
contact involves an infected and a noninfected individual, then with probability p 
the noninfected individual becomes infected. Once infected, an individual remains 
infected throughout. Let X(t) denote the number of infected members of the 
population at time t. 

(a) Is [X(t), t > 0} a continuous-time Markov chain? 

(b) Specify its type. 

(c) Starting with a single infected individual, what is the expected time until all 
members are infected? 

6. Consider a birth and death process with birth rates A, = (i + 1)A, i ^ 0, and 
death rates /x, = z/x, z ^ 0. 

(a) Determine the expected time to go from state 0 to state 4. 

(b) Determine the expected time to go from state 2 to state 5. 

(c) Determine the variances in parts (a) and (b). 

*7. Individuals join a club in accordance with a Poisson process with rate A. Each 
new member must pass through k consecutive stages to become a full member 
of the club. The time it takes to pass through each stage is exponentially dis¬ 
tributed with rate /x. Let /V, (t) denote the number of club members at time t 
who have passed through exactly z stages, i — 1,..., k — 1. Also, let N(f) = 

(a) Is {N(0, t ^ 0} a continuous-time Markov chain? 

(b) If so, give the infinitesimal transition rates. That is, for any state n = (zz i,, 
iik— l) give the possible next states along with their infinitesimal rates. 

8. Consider two machines, both of which have an exponential lifetime with mean 
1 /A. There is a single repairman that can service machines at an exponential rate 
jji. Set up the Kolmogorov backward equations; you need not solve them. 

9. The birth and death process with parameters A„ = 0 and /x„ = /x, n > 0 is called 
a pure death process. Find Pjj(t). 
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10. Consider two machines. Machine i operates for an exponential time with rate Xj 
and then fails; its repair time is exponential with rate /!,■, i = 1, 2. The machines 
act independently of each other. Define a four-state continuous-time Markov 
chain that jointly describes the condition of the two machines. Use the assumed 
independence to compute the transition probabilities for this chain and then verify 
that these transition probabilities satisfy the forward and backward equations. 

*11. Consider a Yule process starting with a single individual—that is, suppose X (0) = 
1. Let Tj denote the time it takes the process to go from a population of size i to 
one of size i + 1. 

(a) Argue that Tj, i = \.... ,j, are independent exponentials with respective 


rates iX. 


(b) Let X\,... ,Xj denote independent exponential random variables each hav¬ 
ing rate X, and interpret X, as the lifetime of component i. Argue that 
max (X\, ..., Xj ) can be expressed as 

max(Yi,- Xj) — e i + £2 H-h £j 

where si, £ 2 are independent exponentials with respective rates jX, 
(j-l)X,...,X. 

Hint: Interpret e, as the time between the i — 1 and the z th failure. 

(c) Using (a) and (b) argue that 


P{Ti + • • • + Tj < t} = (l-e~ Xt y 


(d) Use (c) to obtain 


P\j(t) = (1 - e~ x, ) J ~ l - (1 - e~ x, ) j = e~ x, {\ - 


and hence, given Y(0) = I, X ( t) has a geometric distribution with parameter 
p = e~ kt . 

(e) Now conclude that 



12. Each individual in a biological population is assumed to give birth at an exponen¬ 
tial rate X, and to die at an exponential rate //. In addition, there is an exponential 
rate of increase 6 due to immigration. However, immigration is not allowed when 
the population size is N or larger. 

(a) Set this up as a birth and death model. 

(b) If N — 3, 1 = 0 = X, p, = 2, determine the proportion of time that immi¬ 
gration is restricted. 

13. A small barbershop, operated by a single barber, has room for at most two cus¬ 
tomers. Potential customers arrive at a Poisson rate of three per hour, and the suc¬ 
cessive service times are independent exponential random variables with mean | 
hour. 
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(a) What is the average number of customers in the shop? 

(b) What is the proportion of potential customers that enter the shop? 

(c) If the barber could work twice as fast, how much more business would he do? 

14. Potential customers arrive at a full-service, one-pump gas station at a Poisson 
rate of 20 cars per hour. However, customers will only enter the station for gas if 
there are no more than two cars (including the one currently being attended to) at 
the pump. Suppose the amount of time required to service a car is exponentially 
distributed with a mean of five minutes. 

(a) What fraction of the attendant’s time will be spent servicing cars? 

(b) What fraction of potential customers are lost? 

15. A service center consists of two servers, each working at an exponential rate of 
two services per hour. If customers arrive at a Poisson rate of three per hour, then, 
assuming a system capacity of at most three customers, 

(a) what fraction of potential customers enter the system? 

(b) what would the value of part (a) be if there was only a single server, and his 
rate was twice as fast (that is, p, = 4)? 

*16. The following problem arises in molecular biology. The surface of a bacterium 
consists of several sites at which foreign molecules—some acceptable and some 
not—become attached. We consider a particular site and assume that molecules 
arrive at the site according to a Poisson process with parameter X. Among these 
molecules a proportion a is acceptable. Unacceptable molecules stay at the site 
for a length of time that is exponentially distributed with parameter p\, whereas 
an acceptable molecule remains at the site for an exponential time with rate H 2 . An 
arriving molecule will become attached only if the site is free of other molecules. 
What percentage of time is the site occupied with an acceptable (unacceptable) 
molecule? 

17. Each time a machine is repaired it remains up for an exponentially distributed 
time with rate X. It then fails, and its failure is either of two types. If it is a type 
1 failure, then the time to repair the machine is exponential with rate ji \; if it 
is a type 2 failure, then the repair time is exponential with rate H 2 - Each failure 
is, independently of the time it took the machine to fail, a type 1 failure with 
probability p and a type 2 failure with probability 1 — p. What proportion of time 
is the machine down due to a type 1 failure? What proportion of time is it down 
due to a type 2 failure? What proportion of time is it up? 

18. After being repaired, a machine functions for an exponential time with rate X 
and then fails. Upon failure, a repair process begins. The repair process proceeds 
sequentially through k distinct phases. First a phase 1 repair must be performed, 
then a phase 2, and so on. The times to complete these phases are independent, 
with phase i taking an exponential time with rate /!.;, i = 1 ,,k. 

(a) What proportion of time is the machine undergoing a phase i repair? 

(b) What proportion of time is the machine working? 
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*19. A single repairperson looks after both machines 1 and 2. Each time it is repaired, 
machine i stays up for an exponential time with rate /,;, i = 1, 2. When machine 
i fails, it requires an exponentially distributed amount of work with rate //, to 
complete its repair. The repairperson will always service machine 1 when it is 
down. For instance, if machine 1 fails while 2 is being repaired, then the repair¬ 
person will immediately stop work on machine 2 and start on 1. What proportion 
of time is machine 2 down? 

20. There are two machines, one of which is used as a spare. A working machine 
will function for an exponential time with rate X and will then fail. Upon failure, 
it is immediately replaced by the other machine if that one is in working order, 
and it goes to the repair facility. The repair facility consists of a single person 
who takes an exponential time with rate /i to repair a failed machine. At the 
repair facility, the newly failed machine enters service if the repairperson is free. 
If the repairperson is busy, it waits until the other machine is fixed; at that time, 
the newly repaired machine is put in service and repair begins on the other one. 
Starting with both machines in working condition, find 

(a) the expected value and 

(b) the variance of the time until both are in the repair facility. 

(c) In the long run, what proportion of time is there a working machine? 

21. Suppose that when both machines are down in Exercise 20 a second repairperson 
is called in to work on the newly failed one. Suppose all repair times remain 
exponential with rate /x. Now find the proportion of time at least one machine is 
working, and compare your answer with the one obtained in Exercise 20. 

22. Customers arrive at a single-server queue in accordance with a Poisson process 
having rate X. However, an arrival that finds n customers already in the system will 
only join the system with probability 1 / (n +1). That is, with probability n / (n + 1) 
such an arrival will not join the system. Show that the limiting distribution of the 
number of customers in the system is Poisson with mean Xj ji. 

23. A job shop consists of three machines and two repairmen. The amount of time a 
machine works before breaking down is exponentially distributed with mean 10. 
If the amount of time it takes a single repairman to fix a machine is exponentially 
distributed with mean 8, then 

(a) what is the average number of machines not in use? 

(b) what proportion of time are both repairmen busy? 

*24. Consider a taxi station where taxis and customers arrive in accordance with Pois¬ 
son processes with respective rates of one and two per minute. A taxi will wait 
no matter how many other taxis are present. However, an arriving customer that 
does not find a taxi waiting leaves. Find 

(a) the average number of taxis waiting, and 

(b) the proportion of arriving customers that get taxis. 

25. Customers arrive at a service station, manned by a single server who serves at an 
exponential rate fi\, at a Poisson rate X. After completion of service the customer 



Continuous-Time Markov Chains 


403 


then joins a second system where the server serves at an exponential rate \i 2 - 
Such a system is called a tandem or sequential queueing system. Assuming that 
A < /x/, i = l,2, determine the limiting probabilities. 

Hint: Try a solution of the form P n m = Ca n /3 m , and determine C, a. /3. 

26. Consider an ergodic M/M/s queue in steady state (that is, after a long time) and 
argue that the number presently in the system is independent of the sequence of 
past departure times. That is, for instance, knowing that there have been departures 
2, 3, 5, and 10 time units ago does not affect the distribution of the number 
presently in the system. 

27. In the M/M/s queue if you allow the service rate to depend on the number in 
the system (but in such a way so that it is ergodic), what can you say about the 
output process? What can you say when the service rate /i remains unchanged 
but X > sfil 

*28. If {X(r)} and (T(f)} are independent continuous-time Markov chains, both of 
which are time reversible, show that the process {A(f), Y(t)\ is also a time 
reversible Markov chain. 

29. Consider a set of n machines and a single repair facility to service these machines. 
Suppose that when machine i, i = 1 ,... ,n, fails it requires an exponentially 
distributed amount of work with rate /!,■ to repair it. The repair facility divides its 
efforts equally among all failed machines in the sense that whenever there are k 
failed machines each one receives work at a rate of 1 /k per unit time. If there are a 
total of r working machines, including machine i, then i fails at an instantaneous 
rate k,- fr. 

(a) Define an appropriate state space so as to be able to analyze the preceding 
system as a continuous-time Markov chain. 

(b) Give the instantaneous transition rates (that is, give the q,j). 

(c) Write the time reversibility equations. 

(d) Find the limiting probabilities and show that the process is time reversible. 

30. Consider a graph with nodes 1,2,...,« and the (^) arcs (i, j ), i ^ j, i, j, = 

1,..., n. (See Section 3.6.2 for appropriate definitions.) Suppose that a particle 
moves along this graph as follows: Events occur along the arcs ( i , j) according 
to independent Poisson processes with rates k ( / . An event along arc (i, j ) causes 
that arc to become excited. If the particle is at node i at the moment that (i, j ) 
becomes excited, it instantaneously moves to node j, i, j = l,... ,n. Let Pj 
denote the proportion of time that the particle is at node j. Show that 


1 



Hint: Use time reversibility. 

31. A total of N customers move about among r servers in the following manner. 
When a customer is served by server i , he then goes over to server j, j ^ i , with 
probability l/(r — 1). If the server he goes to is free, then the customer enters 
service; otherwise he joins the queue. The service times are all independent, with 
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the service times at server i being exponential with rate /x, i = I. r. Let the 

state at any time be the vector (n\ . n r ), where n t is the number of customers 

presently at server i, i = 1,..., r, = N. 

(a) Argue that if X ( t ) is the state at time t , then {X (?), t ^ 0} is a continuous-time 
Markov chain. 

(b) Give the infinitesimal rates of this chain. 

(c) Show that this chain is time reversible, and find the limiting probabilities. 

32. Customers arrive at a two-server station in accordance with a Poisson process 
having rate X. Upon arriving, they join a single queue. Whenever a server com¬ 
pletes a service, the person first in line enters service. The service times of server 
i are exponential with rate /x;, i = 1, 2, where m + /ii > X. An arrival find¬ 
ing both servers free is equally likely to go to either one. Define an appropriate 
continuous-time Markov chain for this model, show it is time reversible, and find 
the limiting probabilities. 

*33. Consider two M/M/1 queues with respective parameters X,, /x,, i = 1,2. Sup¬ 
pose they share a common waiting room that can hold at most three customers. 
That is, whenever an arrival finds her server busy and three customers in the wait¬ 
ing room, she goes away. Find the limiting probability that there will be n queue 
1 customers and m queue 2 customers in the system. 

Hint: Use the results of Exercise 28 together with the concept of truncation. 

34. Four workers share an office that contains four telephones. At any time, each 
worker is either “working” or “on the phone.” Each “working” period of worker 
i lasts for an exponentially distributed time with rate X ,, and each “on the phone” 
period lasts for an exponentially distributed time with rate /x;, i = 1, 2, 3, 4. 

(a) What proportion of time are all workers “working”? 

Let Xj (?) equal 1 if worker i is working at time t, and let it be 0 otherwise. 
Let X(f) = (*!(?), X 2 (t), X 3 (t), X 4 (t)). 

(b) Argue that (XL), 1 ^ 0) is a continuous-time Markov chain and give its 
infinitesimal rates. 

(c) Is (X(f)} time reversible? Why or why not? 

Suppose now that one of the phones has broken down. Suppose that a worker 
who is about to use a phone but finds them all being used begins a new “working” 
period. 

(d) What proportion of time are all workers “working”? 

35. Consider a time reversible continuous-time Markov chain having infinitesimal 
transition rates q,j and limiting probabilities { P, |. Let A denote a set of states 
for this chain, and consider a new continuous-time Markov chain with transition 
rates q*. given by 



cqij , if i e A, j £ A 
qij , otherwise 
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where c is an arbitrary positive number. Show that this chain remains time 
reversible, and find its limiting probabilities. 

36. Consider a system of n components such that the working times of component 
i, i = 1, ..., n, are exponentially distributed with rate A.,-. When a component 
fails, however, the repair rate of component i depends on how many other com¬ 
ponents are down. Specifically, suppose that the instantaneous repair rate of com¬ 
ponent i, i = 1 ,..., n, when there are a total of k failed components, is a k fii . 

(a) Explain how we can analyze the preceding as a continuous-time Markov 
chain. Define the states and give the parameters of the chain. 

(b) Show that, in steady state, the chain is time reversible and compute the limiting 
probabilities. 

37. A hospital accepts k different types of patients, where type i patients arrive accord¬ 
ing to a Poisson proccess with rate A.;, with these k Poisson processes being 
independent. Type i patients spend an exponentially distributed length of time 
with rate /x, in the hospital, i = 1 ,,k. Suppose that each type i patient in 
the hospital requires w, units of resources, and that the hospital will not accept a 
new patient if it would result in the total of all patient’s resource needs exceeding 
the amount C. Consequently, it is possible to have n\ type 1 patients, rn type 2 
patients,..., and type k patients in the hospital at the same time if and only if 


k 



i=l 


(a) Define a continuous-time Markov chain to analyze the preceding. 

For parts (b), (c), and (d) suppose that C = oo. 

(b) If N, (?) is the number of type i customers in the system at time t , what type 
of process is {Nj (t), t 0|? Is it time reversible? 

(c) What can be said about the vector process {(Ai (f),..., Aj.(r)) t ^ 0}? 

(d) What are the limiting probabilities of the process of part (c). 

For the remaining parts assume that C < oo. 

(e) Find the limiting probabilities for the Markov chain of part (a). 

(f) At what rate are type i patients admitted? 

(g) What fraction of patients are admitted? 

38. Consider an n server system where the service times of server i are exponentially 
distributed with rate /x;, i = I,..../?. Suppose customers arrive in accordance 
with a Poisson process with rate 1, and that an arrival who finds all servers busy 
does not enter but goes elsewhere. Suppose that an arriving customer who finds 
at least one idle server is served by a randomly chosen one of that group; that is, 
an arrival finding k idle servers is equally likely to be served by any of these k. 


(a) Define states so as to analyze the preceding as a continuous-time Markov 
chain. 
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(b) Show that this chain is time reversible. 

(c) Find the limiting probabilities. 

39. Suppose in Exercise 38 that an entering customer is served by the server who has 
been idle the shortest amount of time. 

(a) Define states so as to analyze this model as a continuous-time Markov chain. 

(b) Show that this chain is time reversible. 

(c) Find the limiting probabilities. 

40. Consider a continuous-time Markov chain with states 1,..., n, which spends an 
exponential time with rate v / in state i during each visit to that state and is then 
equally likely to go to any of the other n — 1 states. 

(a) Is this chain time reversible? 

(b) Find the long-run proportions of time it spends in each state. 

41. Show in Example 6.22 that the limiting probabilities satisfy Equations (6.33), 
(6.34), and (6.35). 

42. In Example 6.22 explain why we would have known before analyzing Example 
6.22 that the limiting probability there are j customers with server! is (/.//x,) / (l — 
/.///,). i = 1, 2, j ^ 0. (What we would not have known was that the number 
of customers at the two servers would, in steady state, be independent.) 

43. Consider a sequential queueing model with three servers, where customers arrive 
at server 1 in accordance with a Poisson process with rate X. After completion 
at server 1 the customer then moves to server 2; after a service completion at 
server 2 the customer moves to server 3; after a service completion at server 3 
the customer departs the system. Assuming that the service times at server i are 
exponential with rate /x £ -, i = 1,2, 3, find the limiting probabilities of this system 
by guessing at the reverse chain and then verifying that your guess is correct. 

44. For the continuous-time Markov chain of Exercise 3 present a uniformized ver¬ 
sion. 

45. In Example 6.20, we computed m{t) — E[ 0(f)], the expected occupation time in 
state 0 by time t for the two-state continuous-time Markov chain starting in state 
0. Another way of obtaining this quantity is by deriving a differential equation 
for it. 

(a) Show that 

m(t + h ) = m{t) + Poo(t)h + o(h ) 


(b) Show that 


m'{t) 



A „-q+A0r 

X + fi 


(c) Solve for m(t). 
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46. Let 0(t ) be the occupation time for state 0 in the two-state continuous-time 
Markov chain. Find E[0(f)|X(O) = 1]. 

47. Consider the two-state continuous-time Markov chain. Starting in state 0, find 
Cov[X(s), X(t)]. 

48. Let Y denote an exponential random variable with rate X that is independent of 
the continuous-time Markov chain {Z(t)} and let 

Pij = P{Y(L) = j\X(0) = i] 

(a) Show that 



where frjj is 1 when i = j and 0 when i ^ j. 

(b) Show that the solution of the preceding set of equations is given by 

p = (i-R/rr 1 

where P is the matrix of elements l)j , I is the identity matrix, and R the 
matrix specified in Section 6.9. 

(c) Suppose now that Y\,... ,Y n are independent exponentials with rate X that 
are independent of {A(f)}. Show that 

p{x(y 1 + --- + y„) = y|x(0) = i} 

is equal to the element in row column j of the matrix P". 

(d) Explain the relationship of the preceding to Approximation 2 of Section 6.9. 

*49. (a) Show that Approximation 1 of Section 6.9 is equivalent to uniformizing the 
continuous-time Markov chain with a value v such that vt = n and then 
approximating Pij (t) by P*f. 

J IJ 

(b) Explain why the preceding should make a good approximation. 

Hint: What is the standard deviation of a Poisson random variable with 

mean n ? 
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Renewal Theory and 
Its Applications 



7.1 Introduction 

We have seen that a Poisson process is a counting process for which the times between 
successive events are independent and identically distributed exponential random vari¬ 
ables. One possible generalization is to consider a counting process for which the times 
between successive events are independent and identically distributed with an arbitrary 
distribution. Such a counting process is called a renewal process. 

Let {IV(t), t ^ 0} be a counting process and let denote the time between the 
(n — l)st and the nth event of this process, n ^ 1. 

Definition 7.1 If the sequence of nonnegative random variables {X \ , X 2 ,...} is inde¬ 
pendent and identically distributed, then the counting process {N(t), t Js 0) is said to 
be a renewal process. 

Thus, a renewal process is a counting process such that the time until the first 
event occurs has some distribution F , the time between the first and second event has, 
independently of the time of the first event, the same distribution F, and so on. When 
an event occurs, we say that a renewal has taken place. 

For an example of a renewal process, suppose that we have an infinite supply of 
lightbulbs whose lifetimes are independent and identically distributed. Suppose also 
that we use a single lightbulb at a time, and when it fails we immediately replace it 
with a new one. Under these conditions, {N(t), t ^ 0} is a renewal process when N(t) 
represents the number of lightbulbs that have failed by time t. 


Introduction to Probability Models, Eleventh Edition. http://dx.doi.org/10.1016/B978-0-12-407948-9.00007-4 
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Figure 7.1 Renewal and interarrival times. 




-x- 

S 3 Time 


For a renewal process having interarrival times X\, X 2 , ..., let 

n 

So = 0, S n = Xi, n ^ 1 
! = 1 

That is, 5) = X 1 is the time of the first renewal; Si = X\ + X 2 is the time until the first 
renewal plus the time between the first and second renewal, that is, S 2 is the time of the 
second renewal. In general, S n denotes the time of the nth renewal (see Figure 7.1). 

We shall let F denote the interarrival distribution and to avoid trivialities, we assume 
that F(0) — P{X n = 0} < 1. Furthermore, we let 

ji = E[Xn\, n ^ 1 

be the mean time between successive renewals. It follows from the nonnegativity of 
X n and the fact that X„ is not identically 0 that /i > 0. 

The first question we shall attempt to answer is whether an infinite number of 
renewals can occur in a finite amount of time. That is, can N(t) be infinite for some 
(finite) value of r? To show that this cannot occur, we first note that, as S n is the time 
of the nth renewal, N(t) may be written as 

N(t) = max{n: S n < t } (7.1) 


To understand why Equation (7.1) is valid, suppose, for instance, that S 4 f but .S 5 > t. 
Hence, the fourth renewal had occurred by time t but the fifth renewal occurred after 
time r; or in other words, N(t). the number of renewals that occurred by time t, must 
equal 4. Now by the strong law of large numbers it follows that, with probability 1, 



n 


But since /x > 0 this means that S n must be going to infinity as n goes to infinity. Thus, 
S n can be less than or equal to t for at most a finite number of values of n, and hence 
by Equation (7.1), N(t) must be finite. 

However, though N(t) < 00 for each t, it is true that, with probability 1, 

N( 00 ) = lim N(t) — 00 

r—*oo 

This follows since the only way in which N(o o), the total number of renewals that 
occur, can be finite is for one of the interarrival times to be infinite. 
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Therefore, 


P{N(oo) < 00 } = P{X n = 00 for some n } 


OO 


= P HJ{X„ = oo} 


in=l 

00 


< = °°! 


= 0 


7.2 Distribution of N{t) 


The distribution of N(t ) can be obtained, at least in theory, by first noting the important 
relationship that the number of renewals by time t is greater than or equal to n if and 
only if the nth renewal occurs before or at time t. That is, 


N(t) ^ n S„ ^ / 


(7.2) 


From Equation (7.2) we obtain 

P{N(t) = n) = P{N(t) f n } - P{N(t) ^ n + 1} 
= P{S n < t] - P{S„ + i < t) 


(7.3) 


Now, since the random variables Xi , i f 1, are independent and have a common 
distribution F, it follows that S n = i s distributed as /■’„, the n-fold convolution 

of F with itself (Section 2.5). Therefore, from Equation (7.3) we obtain 


P{N(t) = n] = F n (t) - F n+l (t) 


Example 7.1 Suppose that P{X n = i ) = p{ 1 — p) l ~ l ,i f 1. That is, suppose 
that the interarrival distribution is geometric. Now ,S'i = X 1 may be interpreted as the 
number of trials necessary to get a single success when each trial is independent and has 
a probability p of being a success. Similarly, S„ may be interpreted as the number of 
trials necessary to attain n successes, and hence has the negative binomial distribution 



Thus, from Equation (7.3) we have that 



k=n 
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Equivalently, since an event independently occurs with probability p at each of the 
times 1,2 ,... 


P{N(t) = n} = ( [ * ] V (1 - p) m ~ n ■ 

Another expression for P(N(t ) = n) can be obtained by conditioning on S n . This 
yields 

nOO 

P ( N(t ) = n) = / P ( N(t) = n\S n = y ) fs n (y)dy 

Jo 

Now, if the nth event occurred at time y > t, then there would have been less than n 
events by time t. On the other hand, if it occurred at a time y<t, then there would be 
exactly n events by time t if the next interarrival exceeds t — y. Consequently, 

p ( N(t ) = n)= f P (X n+] > t - y\S n = y)fs n (y)dy 
Jo 

F(t- y)fs n (y)dy 

where F = 1 — F. 

Example 7.2 If F(x) = 1 — e' x then S n , being the sum of n independent exponentials 
with rate X, will have a gamma (n, X) distribution. Consequently, the preceding identity 
gives 



(N(t) = n)= f 
Jo 




(«-D! 


\n -Xt nt 

—— f y"- l dy 

(« - 1 )! Jo ' 


(n 
.-x, 


By using Equation (7.2) we can calculate m(t), the mean value of N(t), as 
m(t) = E[N (f)] 

OO 

= >«} 


n =1 


= P{S » < t] 
n =1 
oo 

= Y, F n(f) 


n =1 
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where we have used the fact that if X is nonnegative and integer valued, then 

00 OO k 

e[x] = J2 kp ( x = k ) = EE p ^ x =^ 

k =1 k=\n=\ 

OO OO OO 

= E E p[x = k] = E p[x > n] 

n= 1 k=n 7i=l 

The function mil) is known as the mean-value or the renewal function. 

It can be shown that the mean-value function m ( t ) uniquely determines the renewal 
process. Specifically, there is a one-to-one correspondence between the interarrival 
distributions F and the mean-value functions m(t). 

Another interesting result that we state without proof is that 

m(t) < oo for all f < oo 

Remarks 

(i) Since m(t) uniquely determines the interarrival distribution, it follows that the 
Poisson process is the only renewal process having a linear mean-value function. 

(ii) Some readers might think that the finiteness of m{t) should follow directly from 
the fact that, with probability 1, Nit) is finite. However, such reasoning is not valid; 
consider the following: Let Y be a random variable having the following probability 
distribution: 

Y — 2" with probability (j)" , n ^ 1 


Now, 


P{Y < oo} = Y, p i Y = 2 "1 = E ( 2 )" = 1 

71=1 71 = 1 


But 

OO OO 

E[Y] = np {Y = 2"} = E 2 " ( 5 )" = 00 

71=1 71 = 1 


Hence, even when Y is finite, it can still be true that E[Y] — 00 . 

An integral equation satisfied by the renewal function can be obtained by condi¬ 
tioning on the time of the first renewal. Assuming that the interarrival distribution F is 
continuous with density function / this yields 


m(t) = E\N (f)] = 



£'[lV(f)|Ai = x]f(x)dx 


(7.4) 


Now suppose that the first renewal occurs at a time x that is less than t. Then, using 
the fact that a renewal process probabilistically starts over when a renewal occurs, it 
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follows that the number of renewals by time t would have the same distribution as 1 
plus the number of renewals in the first t—x time units. Therefore, 

.E^ATOIXi = x] = 1 + E[N(t — x )] if j c < t 
Since, clearly 

E[N(t)\X\ = x] = 0 whenx>f 
we obtain from Equation (7.4) that 



(7.5) 


Equation (7.5) is called the renewal equation and can sometimes be solved to obtain 
the renewal function. 

Example 7.3 One instance in which the renewal equation can be solved is when 
the interarrival distribution is uniform—say, uniform on (0,1). We will now present a 
solution in this case when t ^ 1. For such values of t, the renewal function becomes 


— t + m ( y)dy by the substitution y = t — x 

Differentiating the preceding equation yields 
m! it ) = 1 + mit) 



Letting h(t) = 1 + m(t), we obtain 
h'(t) = hit) 


or 


log h{t) — t + C 


or 


hit) = Ke r 


or 


m it) = Ke' — 1 


Since miO) = 0, we see that K = 1, and so we obtain 
mit) — e' — 1, 0< f ^ 1 
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7.3 Limit Theorems and Their Applications 


We have shown previously that, with probability 1, N(t ) goes to infinity as t goes to 
infinity. However, it would be nice to know the rate at which N(t) goes to infinity. That 
is, we would like to be able to say something about lim f _ >00 N(t)/t. 

As a prelude to determining the rate at which N{t ) grows, let us first consider 
the random variable SV(?)- I' 1 words, just what does this random variable represent? 
Proceeding inductively suppose, for instance, that N(t)— 3. Then 5^(0 = ^3 represents 
the time of the third event. Since there are only three events that have occurred by time 
t , S 3 also represents the time of the last event prior to (or at) time t. This is, in fact, what 
represents—namely, the time of the last renewal prior to or at time t. Similar 
reasoning leads to the conclusion that SV(r )+1 represents the time of the first renewal 
after time t (see Figure 7.2). We now are ready to prove the following. 

Proposition 7.1 With probability 1, 

N(t) 1 

->■ — as t —> oo 

t /x 

Proof. Since SV(r) is the time of the last renewal prior to or at time t, and Sjv(0+i is 
the time of the first renewal after time t, we have 


Sn(i) ^ t < Sff(t )+1 


or 


Sn(Q t SN(t )+1 
N(t) N(t) < N(t) 


(7.6) 


However, since Sjv(r)/iV(f) = X, /N(t) is the average of N(t) independent and 

identically distributed random variables, it follows by the strong law of large numbers 
that SN( t )/N(t ) —> /x as N(t ) —> oo. But since N(t ) —*■ oo when t —> oo, we obtain 


■SV(f) 

N{t ) 


/x 


as t 


oo 


Furthermore, writing 


SnQ)+ 1 _ / %(r)+1 \ / N(t) + 1 \ 

N(t) ~ \N(t) + 1/ V N(t) ) 


we have that 5jv(f)+t/(iV(r) + 1) —>■ /x by the same reasoning as before and 


N(t) + 1 

- > 1 as t —> oo 

N{t) 


-x- 


D N(t) 


D N(t)+ 1 


Time 


Figure 7.2 
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Hence, 


SN(t)+ i 
N(t) 


- y fIt 


as t —> oo 


The result now follows by Equation (7.6) since t/N ( t ) is between two random variables, 
each of which converges to /r as t —> oo. ■ 

Remarks 


(i) The preceding propositions are true even when /i, the mean time between renewals, 
is infinite. In this case, we interpret l//r to be 0. 

(ii) The number l//x is called the rate of the renewal process. 

(iii) Because the average time between renewals is /x, it is quite intuitive that the average 

rate at which renewals occur is 1 per every fi time units. ■ 

Example 7.4 Beverly has a radio that works on a single battery. As soon as the 
battery in use fails, Beverly immediately replaces it with a new battery. If the lifetime 
of a battery (in hours) is distributed uniformly over the interval (30,60), then at what 
rate does Beverly have to change batteries? 

Solution: If we let N(t ) denote the number of batteries that have failed by time 
t, we have by Proposition 7.1 that the rate at which Beverly replaces batteries is 
given by 

,. N(t) 1 1 

hm -= = — 

t^oo t n 45 

That is, in the long run, Beverly will have to replace one battery every 45 hours. ■ 

Example 7.5 Suppose in Example 7.4 that Beverly does not keep any surplus 
batteries on hand, and so each time a failure occurs she must go and buy a new battery. 
If the amount of time it takes for her to get a new battery is uniformly distributed over 
(0, 1), then what is the average rate that Beverly changes batteries? 

Solution: In this case the mean time between renewals is given by 


M = E[U i] + E[U 2 ] 


where U\ is uniform over (30,60) and U 2 is uniform over (0,1). Hence, 

H = 45 + j = 45 j 

and so in the long run, Beverly will be putting in a new battery at the rate of iy. That 
is, she will put in two new batteries every 91 hours. ■ 

Example 7.6 Suppose that potential customers arrive at a single-server bank in accor¬ 
dance with a Poisson process having rate k. However, suppose that the potential cus¬ 
tomer will enter the bank only if the server is free when he arrives. That is, if there is 
already a customer in the bank, then our arriver, rather than entering the bank, will go 
home. If we assume that the amount of time spent in the bank by an entering customer 
is a random variable having distribution G, then 
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(a) what is the rate at which customers enter the bank? 

(b) what proportion of potential customers actually enter the bank? 

Solution: In answering these questions, let us suppose that at time 0 a customer 
has just entered the bank. (That is, we define the process to start when the first 
customer enters the bank.) If we let piQ denote the mean service time, then, by the 
memoryless property of the Poisson process, it follows that the mean time between 
entering customers is 

1 

P = PG + - 

Hence, the rate at which customers enter the bank will be given by 


1 


/x 1 + kp G 

On the other hand, since potential customers will be arriving at a rate k, it follows 
that the proportion of them entering the bank will be given by 


k/a + kua) i 


k 1 + kjlQ 


In particular if k = 2 and /i G = 2, then only one customer out of five will actually 


enter the system. 


A somewhat unusual application of Proposition 7.1 is provided by our next example. 

Example 7.7 A sequence of independent trials, each of which results in outcome 

number i with probability P ,, i = 1. n, X7=i ~ 1. i s observed until the same 

outcome occurs k times in a row; this outcome then is declared to be the winner of the 
game. For instance, if k = 2 and the sequence of outcomes is 1,2, 4, 3, 5, 2, 1, 3, 3, 
then we stop after nine trials and declare outcome number 3 the winner. What is the 
probability that; wins, i = 1 ,,n, and what is the expected number of trials? 

Solution: We begin by computing the expected number of coin tosses, call it E[T], 
until a run of k successive heads occurs when the tosses are independent and each 
lands on heads with probability p. By conditioning on the time of the first nonhead, 
we obtain 


k 


E[T] = (1 - P)P j ~\j + E[T]) + kp k 


Solving this for E[T] yields 
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Upon simplifying, we obtain 


E[T] = 


1 + P + --- + p k ~' 


P 


k 


1 - p k 

P k (1 - P) 


(7.7) 


Now, let us return to our example, and let us suppose that as soon as the winner 
of a game has been determined we immediately begin playing another game. For 
each i let us determine the rate at which outcome i wins. Now, every time i wins, 
everything starts over again and thus wins by i constitute renewals. Hence, from 
Proposition 7.1, the 


1 

rate at which i wins = - 

E[N t ] 

where Nj denotes the number of trials played between successive wins of outcome i . 
Hence, from Equation (7.7) we see that 


rate at which i wins = 


pH i ~ ft) 

i -P k 


(7.8) 


Hence, the long-run proportion of games that are won by number i is given by 


proportion of games i wins = 


rate at which i wins 
Yl"j= i rate a t which j wins 


p- (i - Pi)/a - p k ) 

E”=t (PjO- - p .0/ ( 1 - Pj» 


However, it follows from the strong law of large numbers that the long-run proportion 
of games that i wins will, with probability 1, be equal to the probability that i wins 
any given game. Hence, 


„ r . • , /f (1 - P,)/(\ - /f) 

E"=t (PjC i - Pj)/( i - Pp) 

To compute the expected time of a game, we first note that the 


rate at which games end = rate at which i wi 


wins 
i=i 

A pH i - Pi) 

= 2 P - j .— (from Equation (7.8)) 

i=l 1 _ P i 
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Now, as everything starts over when a game ends, it follows by Proposition 7.1 that 
the rate at which games end is equal to the reciprocal of the mean time of a game. 
Hence, 


/-’[time of a game} 


1 


rate at which games end 

1 

e"=i i - /a - p,h) 


Proposition 7.1 says that the average renewal rate up to time t will, with probability 
1, converge to 1/fx as f —> oo. What about the expected average renewal rate? Is it 
true that m(t)/t also converges to 1 //x? This result is known as the elementary renewal 
theorem. 

Theorem 7.1 (Elementary Renewal Theorem) 


m(t) 1 

-> — as t -* oo 

t /i 

As before, 1 //x is interpreted as 0 when fx = oo. 

Remark At first glance it might seem that the elementary renewal theorem should be 
a simple consequence of Proposition 7.1. That is, since the average renewal rate will, 
with probability 1, converge to l//x, should this not imply that the expected average 
renewal rate also converges to 1 //x? We must, however, be careful; consider the next 
example. 

Example 7.8 Let U be a random variable which is uniformly distributed on (0,1); 
and define the random variables Y n , n ^ 1, by 


0, if U > l/n 
n, if U ^ l/n 


Now, since, with probability 1, U will be greater than 0, it follows that Y n will equal 
0 for all sufficiently large n. That is, Y„ will equal 0 for all n large enough so that 
l/n < U. Hence, with probability 1, 


oo 


However, 


E[Y n ] = nP 



1 

= n — = 1 
n 


Therefore, even though the sequence of random variables Y n converges to 0, the 
expected values of the Y n are all identically 1. ■ 

To prove the elementary renewal theorem we will make use of an identity known as 
Wald’s equation. Before stating Wald’s equation we need to introduce the concept of a 
stopping time for a sequence of independent random variables. 
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Definition The nonnegative integer valued random variable N is said to be a stopping 
time for a sequence of independent random variables X\, Xj, .. . if the event that 
{N = n} is independent of X n+ \, X n + 2 , ■ ■ ■, for all n = 1,2,.... 

The idea behind a stopping time is that we imagine that the X ; are observed in 
sequence, first X \, then Xj, and so on, and that N denotes the number of them observed 
before stopping. Because the event that we stop after having observed X\,..., X n 
can only depend on these n values, and not on future unobserved values, it must be 
independent of these future values. 

Example 7.9 Suppose that Xi, X^, ... is a sequence of independent and identically 
distributed random variables with 


P{Xi = \) = p=\- P(Xi = 0 ) 
where p > 0. If we define 


N — min (n : X\ + • • - + X n — r) 

then N is a stopping time for the sequence. If we imagine that trials are being performed 
in sequence and that X, = 1 corresponds to a success on trial i. then N is the number of 
trials needed until there have been a total of r successes when each trial is independently 
a success with probability p. ■ 

Example 7.10 Suppose that X\, X 2 ,... is a sequence of independent and identically 
distributed random variables with 


P{Xi = \)=\/2=\- P{Xi = -\) 


If 


N — min (n : Xi + • • • + X n = 1) 


then N is a stopping time for the sequence. N can be regarded as the stopping time for 
a gambler who on each play is equally likely to win or lose 1, and who is going to stop 
the first time he is winning money. (Because the successive winnings of the gambler 
are a symmetric random walk, which we showed in Chapter 4 to be a recurrent Markov 
chain, it follows that P(N < 00 ) = 1.) ■ 

We are now ready for Wald’s equation. 

Theorem 7.2 (Wald’s Equation) If X\, X 2 ,..., is a sequence of independent and 
identically distributed random variables with finite expectation E[X], and if /V is a 
stopping time for this sequence such that E[ N] < 00 , then 


E 


- N 

1 >" 


Ln=l 


= E[N]E[X] 


Proof. For n = 1, 2, ..., let 


I, 


1, if n ^ fV 
0, if n > IV 
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and note that 

N 


E*» = J2 x " In 

n =1 n =1 

Taking expectations gives 


N 


E x » 


Ln=] 


= E 


E x " ! » 


L/i=l 


= £] £[*„/„] 


n =1 


Now /„ = I if N n. which means that /„ = 1 if we have not yet stopped after having 
observed X\,..., X n _\. But this implies that the value of /„ is determined before X n 
has been observed, and thus X n is independent of /„. Consequently, 

E[X n I n ] = E[X n ]E[I n ] = E[X]E[I n ] 
showing that 


E*« 


L n=l 


= E[X]J2E[In] 


n= 1 


= E[X]E 

.n= l J 
= E[X]E[N] 


E'» 


To apply Wald’s equation to renewal theory, let X [, X 2 , ... be the sequence of interar¬ 
rival times of a renewal process. If we observe these one at a time and then stop at the 
first renewal after time t, then we would stop after having observed X[,..., Wvcri+i > 
showing that N(t) + 1 is a stopping time for the sequence of interarrival times. For a 
more formal argument that N(t) + 1 is a stopping time for the sequence of interarrival 
times, note that N(t ) = n — 1 if and only if the (n — l)st renewal occurs by time t and 
the nth renewal occurs after time t. That is, 


N{t) + 1 — n -4^ N(t) — n — 1 4^- X 1 + • • ■ + X n —1 sj t , X\ + • • ■ + X n > t 

showing that the event that N{t) + 1 — n depends only on the values of Xi,... , X n . 
We thus have the following corollary of Wald’s equation. 

Proposition 7.2 If X\ , X 2 . ■ ■ ■, are the interarrival times of a renewal process then 
E[X { + ■ ■ • + Xjv W+1 ] = E[X]E[N(t) + 1] 

That is, 


£[S;v(r)+l] = H-[m(t) + 1] 

We are now ready to prove the elementary renewal theorem. 
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Proof of Elementary Renewal Theorem: Because S^(t)+i is the time of the first 
renewal after f, it follows that 

$N(t)+ 1 = t + Y (t) 

where Y(t), called the excess at time f, is defined as the time from f until the next 
renewal. Taking expectations of the preceding yields, upon applying Proposition 7.2, 
that 


+ 1) = ? + E[Y(t)] 


(7.9) 


which can be written as 

m(t) _ 1 | E[Y(t)] 1 

f 


t 


M 


f/x 


Because Y (?) ^ 0, the preceding yields that "p- ^ j, ~ showing that 

777 (f) 1 

lim —— ^ - 

'“►CO f /X 

To show that lim,^.^, p^- ^ 7-, let us suppose that there is a value M < oo such that 
P(Xi < M) = 1. Because this implies that Y (?) must also be less than M, we have 
that E[Y(t)] < M, and so 

772 (f) [Ml 
- ^-I- 

t )JL t /X t 

which gives that 

777 (f) 1 

lim —- < - 

t^oo t /X 

and thus completes the proof of the elementary renewal theorem when the interarrival 
times are bounded. When the interarrival times , X 2 ,... are unbounded, fix M > 0, 
and let Nm (?), ? ^ 0 be the renewal process with interarrival times min (X;, M ) ,7 7s 1. 
Because min (X/, M) ^ X,- for all i, it follows that A?m(?) ^ A?(f) for all f. (That 
is, because each interarrival time of Nm (?) is smaller than its corresponding interar¬ 
rival time of A?(f), it must have at least as many renewals by time f.) Consequently, 
£[A?(f)] ^ E[Nm(J)], showing that 


lim 


E[N (?)] 


^ lim 

t—>OC 


E[N M (t)] 


1 


E[min (X,-, M)\ 


where the equality follows because the interarrival times of A?m(?) are bounded. Using 
that limM-!.oo E[min (X,-, M)] = £[X,-] = /x, we obtain from the preceding upon 
letting M 00 that 


lim 

r-7-oo 


777 (?) 


1 

- 

/X 


and the proof is completed. 
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Equation (7.9) shows that if we can determine E[Y (f)]; the mean excess at time t, then 
we can compute m(t) and vice versa. 

Example 7.11 Consider the renewal process whose interarrival distribution is the 
convolution of two exponentials; that is, 

F — F\* F 2 , where F,(f) = 1 — i = 1,2 


We will determine the renewal function by first determining E[ Y (7)]. To obtain the 
mean excess at t, imagine that each renewal corresponds to a new machine being put 
in use, and suppose that each machine has two components—initially component 1 
is employed and this lasts an exponential time with rate pt\, and then component 2, 
which functions for an exponential time with rate // 2 , is employed. When component 2 
fails, a new machine is put in use (that is, a renewal occurs). Now consider the process 
{X(t), t 7s 0} where X it) is i if a type i component is in use at time t. It is easy to 
see that (X(f), t ^ 0[ is a two-state continuous-time Markov chain, and so, using the 
results of Example 6.11, its transition probabilities are 

P u (t) = ^ e -(w+w)* + 

M1+M2 M1+M2 

To compute the expected remaining life of the machine in use at time t, we condition 
on whether it is using its first or second component: for if it is still using its first 
component, then its remaining life is 1 //xi + 1 // 12 , whereas if it is already using its 
second component, then its remaining life is l/p. 2 - Hence, letting pit) denote the 
probability that the machine in use at time t is using its first component, we have 


E[Y(t)] = ( — 
VMl 



J_ + P^]_ 

M 2 Ml 


Pit) + 


1 - Pit) 

M 2 


But, since at time 0 the first machine is utilizing its first component, it follows that 
p{t) = P\ 1 (t), and so, upon using the preceding expression of P\ \ it ), we obtain 


£[T(r)] = — +--- e -(w+w)' 

M 2 Ml + M 2 

Now it follows from Equation (7.9) that 


M 2 

Mi (Ml + M2) 


(7.10) 


m(t) + 1 = 


t 

u 


E[Y(t)] 

M 


where / 1 , the mean interarrival time, is given in this case by 


(7.11) 


_ 1 1 _ Ml + M 2 

Ml M2 M1M2 

Substituting Equation (7.10) and the preceding equation into (7.11) yields, after sim¬ 
plifying, 


m{t) = ^ 2 t - ^ 2 , [1 - 

M1+M2 (M1+M2) 2 
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Remark Using the relationship of Equation (7.11) and results from the two-state 
continuous-time Markov chain, the renewal function can also be obtained in the same 
manner as in Example 7.9 for the interarrival distributions 

F(t) = pF\{t) + (1 - p)F 2 (t) 


and 


F(t) = P F 1 (t) +(l- p)(F l *F 2 )(t) 


when Fj(t) = 1 — e _wf , t > 0, i = 1, 2. ■ 

Suppose the interarrival times of a renewal process are all positive integer valued. 
Let 


h = 


1, if there is a renewal at time i 
0, otherwise 


and note that N(n), the number of renewals by time n, can be expressed as 


N(n) = h 
/=t 


Taking expectations of both sides of the preceding shows that 


n 

m{ii) = E [lV(n)] = P (renewal at time i ) 

i=l 

Hence, the elementary renewal theorem yields 

^" =1 P (renewal at time ;) 1 

n £[time between renewals] 

Now, for a sequence of numbers a\ , a 2 , ... it can be shown that 

E n 

■ i di 

urn a„ = ci =?■ mil - = a 

n—> oo n —>■ oo n 

Hence, if lim,,-^ P (renewal at time n) exists then that limit must equal 

1 

Eftime between renewals] 

Example 7.12 Let Xj, i ^ 1 be independent and identically distributed random 
variables, and set 


n 

S 0 = 0, S n = ^ / X i ,n > 0 

i=i 


The process {S,,, n ^ 0} is called a random walk process. Suppose that E[X, ] < 0. 
The strong law of large numbers yields 



«->oo n 


-* E[Xi] 
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But if S n divided by n is converging to a negative number, then S n must be going to 
minus infinity. Let a be the probability that the random walk is always negative after 
the initial movement. That is, 

a = P(S n < 0 for all n ^ 1) 

To determine a, define a counting process by saying that an event occurs at time n 
if S(n) < min (0, Si, ..., S„_ 1 ). That is, an event occurs each time the random walk 
process reaches a new low. Now, if an event occurs at time n, then the next event will 
occur k time units later if 

X n +1 S? 0, X n +\ + X n+ 2 ^ 0,..., X n+ \ + ■ • • + X n +k-i Js 0, 

X n +\ + ■ • • + X n -\-k < 0 

Because X;, i ^ 1 are independent and identically distributed the preceding event is 
independent of the values of X \,..., X n , and its probability of occurrence does not 
depend on n. Consequently, the times between successive events are independent and 
identically distributed, showing that the counting process is a renewal process. Now, 

^(renewal at n) — P (S„ < 0, S n < S\, S n < S’ 2 ,..., S n < S n - 1 ) 

= P(X 1 + ■ • • + X„ < 0, X 2 + ■ ■ • + X„ < 0, 

X 3 + ■ • • + X n < 0,..., X n < 0) 

Because X n , X„_i, ..., X\ has the same joint distribution as does X\, Xj. ■ ■ ■ - X n it 
follows that the value of the preceding probability would be unchanged if X\ were 
replaced by X n ; Xj were replaced by X„_ 1 ; X 3 were replaced by X n - 2 ', and so on. 
Consequently, 

P (renewal at n) — P{X n + • • ■ + X\ <0, X n -\ + • • • + Xj <0, 

X n —2 + ■ ■ ■ + Xi < 0, Xi < 0) 

= P(S„ < 0, Sn -1 < 0, Sn -2 < 0, .... Si < 0) 

Hence, 

lim P (renewal at n) = P(S n < 0 for all n X 1) = a 

ft —>00 V ' 

But, by the elementary renewal theorem, this implies that 
a = l/E[T] 

where T is the mean time between renewals. That is, 

T = min {n : S n < 0} 

For instance, in the case of left skip free random walks (which are ones for which 
P(Xi — j) — 1) we showed in Section 3.6.6 that E[T] — — l/Zs[X;] when 
E[Xj] < 0, showing that for skip free random walks having a negative mean, 

P(S n < 0 for all n) = —E [X, ] 

which verifies a result previously obtained in Section 3.6.6. ■ 
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An important limit theorem is the central limit theorem for renewal processes. This 
states that, for large t,N(t ) is approximately normally distributed with mean t//i 
and variance fcr 2 //r 3 , where /i and a 2 are, respectively, the mean and variance of the 
interarrival distribution. That is, we have the following theorem which we state without 
proof. 

Theorem 7.3 (Central Limit Theorem for Renewal Processes) 


lim P 

f—>oo 


N(t) - t/fi 

7 to 2 / jJ? 



2 ' 2 d 


X 


In addition, as might be expected from the central limit theorem for renewal pro¬ 
cesses, it can be shown that Var (N(t))/t converges to <r 2 //7. That is, it can be shown 
that 


Var (N(t)) 

lim - 

r-»oo t 


= ° 2 lv? 


(7.12) 


Example 7.13 Two machines continually process an unending number of jobs. The 
time that it takes to process a job on machine 1 is a gamma random variable with 
parameters n = 4, X = 2, whereas the time that it takes to process a job on machine 
2 is uniformly distributed between 0 and 4. Approximate the probability that together 
the two machines can process at least 90 jobs by time t = 100. 

Solution: If we let Nj (f ) denote the number of jobs that machine i can process by 
time t, then t ^ 0 } and \N 2 (l), t ^ 0 } are independent renewal processes. 

The interarrival distribution of the first renewal process is gamma with parameters 
n — 4,1 = 2, and thus has mean 2 and variance 1. Correspondingly, the interarrival 
distribution of the second renewal process is uniform between 0 and 4, and thus has 
mean 2 and variance 16/12. 

Therefore, /V| (100) is approximately normal with mean 50 and variance 100/8; 
and A 2 (100) is approximately normal with mean 50 and variance 100/6. Hence, 
/V] (100) + /V 2 (100) is approximately normal with mean 100 and variance 175/6. 
Thus, with <J> denoting the standard normal distribution function, we have 


P{/Vi(100)+A 2 (100) > 89.5} = P 


Ai(100)+A 2 (100)-100 89.5-100] 

vT7576 1 


1 - <J> 


( 


$ 




7175/6 
10.5 \ 
^7175/6/ 
10.5 \ 


l,7m/67 

CD (1.944) 
0.9741 
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7.4 Renewal Reward Processes 


A large number of probability models are special cases of the following model. Consider 
a renewal process {N(t), t ^ 0} having interarrival times X n , n 0 1, and suppose that 
each time a renewal occurs we receive a reward. We denote by R n the reward earned 
at the time of the nth renewal. We shall assume that the R n , n 0 1, are independent 
and identically distributed. However, we do allow for the possibility that R n may (and 
usually will) depend on X n , the length of the nth renewal interval. If we let 

m ) 

R(t) = Rn 

n=1 

then R(t) represents the total reward earned by time t. Let 
£[/?] = E[R n ], E[X] = E[X n ] 

Proposition 7.3 If £[/?] < oo and E\X] < oo, then 

(a) with probability 1, lim 

f-»oo 1 ^L A J 

(b) lim 

w t e\x\ 


Proof. We give the proof for (a) only. To prove this, write 


m = En=l R n = ( EnP{Rn \ (N (t) 
t t \ N(t) 


N(t ) 


By the strong law of large numbers we obtain 
as t —> oo 


V ^N(t) „ 

En= 1 "" 


N(t) 

and by Proposition 7.1 

NO) ^ _J_ 

t E[X] 
The result thus follows. 


as t —> oo 


Remark 


(i) If we say that a cycle is completed every time a renewal occurs, then Proposition 7.3 
states that the long-run average reward per unit time is equal to the expected reward 
earned during a cycle divided by the expected length of a cycle. For instance, in 
Example 7.6 if we suppose that the amounts that the successive customers deposit 
in the bank are independent random variables having a common distribution H, then 
the rate at which deposits accumulate—that is, lim ,->00 (total deposits by the timer) 
/f)—is given by 

£ [deposits during a cycle] n h 

Eftime of cycle] + 1/A, 

where /xq + 1/A is the mean time of a cycle, and /i h i s the mean of the distribution H. 
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(ii) Although we have supposed that the reward is earned at the time of a renewal, the 
result remains valid when the reward is earned gradually throughout the renewal 
cycle. 

Example 7.14 (A Car Buying Model) The lifetime of a car is a continuous random 
variable having a distribution H and probability density h . Mr. Brown has a policy that 
he buys a new car as soon as his old one either breaks down or reaches the age of T 
years. Suppose that a new car costs C\ dollars and also that an additional cost of C 2 
dollars is incurred whenever Mr. Brown’s car breaks down. Under the assumption that 
a used car has no resale value, what is Mr. Brown’s long-run average cost? 

If we say that a cycle is complete every time Mr. Brown gets a new car, then it 
follows from Proposition 7.3 (with costs replacing rewards) that his long-run average 
cost equals 

(cost incurred during a cycle] 

E [length of a cycle] 

Now letting X be the lifetime of Mr. Brown’s car during an arbitrary cycle, then the 
cost incurred during that cycle will be given by 

Ci, ifX>T 
Ci + C 2 , if X < T 

so the expected cost incurred over a cycle is 

Ci P{X >T} + (Ci + C 2 )P{X < T) = Ci + C 2 H(T) 

Also, the length of the cycle is 

X, if A < T 
T, if X > T 

and so the expected length of a cycle is 

pT poo pT 

I xh(x)dx + / Th(x)dx = I xh(x)dx + T[\ — H(T)] 

Jo Jt Jo 

Therefore, Mr. Brown’s long-run average cost will be 
Ci + C 2 H(T) 

fixh(x)dx + T\l -H(T)] 

Now, suppose that the lifetime of a car (in years) is uniformly distributed over (0,10), 
and suppose that Ci is 3 (thousand) dollars and C 2 is 5 (thousand) dollars. What value 
of T minimizes Mr. Brown’s long-run average cost? 
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If Mr. Brown uses the value T, T ^ 10, then from Equation (7.13) his long-run 
average cost equals 


3 + 2 ( 7710 ) 


3 + 7720 


/ 0 r (x/10 )dx + T( 1 - 7710) t2 / 20 + ( l0T ~ T 2 )/10 

60+7’ 

~~ 20 T - T 2 

We can now minimize this by using the calculus. Toward this end, let 
60+7’ 


g(T) = 


207’ - T 2 


then 


g\T) = 


(207’ - T 2 ) - (60 + 7’)(20 - 2 T) 


(20r - T 2 ) 2 

Equating to 0 yields 

207’ - T 2 = (60 + 71(20 - 27’) 
or, equivalently, 

T 2 + 120 T - 1200 = 0 
which yields the solutions 

T & 9.25 and T « —129.25 


Since T ^ 10, it follows that the optimal policy for Mr. Brown would be to purchase 
a new car whenever his old car reaches the age of 9.25 years. ■ 

Example 7.15 (Dispatching a Train) Suppose that customers arrive at a train depot 
in accordance with a renewal process having a mean interarrival time /x. Whenever 
there are N customers waiting in the depot, a train leaves. If the depot incurs a cost at 
the rate of nc dollars per unit time whenever there are n customers waiting, what is the 
average cost incurred by the depot? 

If we say that a cycle is completed whenever a train leaves, then the preceding is a 
renewal reward process. The expected length of a cycle is the expected time required 
for N customers to arrive and, since the mean interarrival time is //, this equals 

E [length of cycle] = N /j. 

If we let T n denote the time between the nth and (n + l)st arrival in a cycle, then the 
expected cost of a cycle may be expressed as 


7i[cost of a cycle] = E[c T\ + 2c 73 + • • • + (N — 1) c 77_i] 
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which, since E[T„] = /x, equals 
N 

- 1 ) 

Hence, the average cost incurred by the depot is 

h 

2 N(jl ~ 2 

Suppose now that each time a train leaves, the depot incurs a cost of six units. What 
value of N minimizes the depot’s long-run average cost when c — 2, /x — 1? 

In this case, we have that the average cost per unit time N is 

6 + c[x N{N - l)/2 = n 6 
Nfi + N 

By treating this as a continuous function of N and using the calculus, we obtain that 
the minimal value of N is 

N — \/6 « 2.45 


Hence, the optimal integral value of N is either 2 which yields a value 4, or 3 which 
also yields the value 4. Hence, either N = 2 or N = 3 minimizes the depot’s average 
cost. ■ 

Example 7.16 Suppose that customers arrive at a single-server system in accordance 
with a Poisson process with rate X. Upon arriving a customer must pass through a door 
that leads to the server. However, each time someone passes through, the door becomes 
locked for the next t units of time. An arrival finding a locked door is lost, and a cost 
c is incurred by the system. An arrival finding the door unlocked passes through to 
the server. If the server is free, the customer enters service; if the server is busy, the 
customer departs without service and a cost K is incurred. If the service time of a 
customer is exponential with rate /x, find the average cost per unit time incurred by the 
system. 

Solution: The preceding can be considered to be a renewal reward process, with 
a new cycle beginning each time a customer arrives to find the door unlocked. This 
is so because whether or not the arrival finds the server free, the door will become 
locked for the next t time units and the server will be busy for a time X that is 
exponentially distributed with rate fx. (If the server is free, X is the service time of 
the entering customer; if the server is busy, X is the remaining service time of the 
customer in service.) Since the next cycle will begin at the first arrival epoch after a 
time t has passed, it follows that 

Eftime of a cycle] = t + \/X 

Let C i denote the cost incurred during a cycle due to arrivals finding the door locked. 
Then, since each arrival in the first t time units of a cycle will result in a cost c, we 
have 


E[C i] = Xtc 
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Also, let Ci denote the cost incurred during a cycle due to an arrival finding the door 
unlocked but the server busy. Then because a cost K is incurred if the server is still 
busy a time t after the cycle began and, in addition, the next arrival after that time 
occurs before the service completion, we see that 

E[C 2 ] = Ke-' u -^— 

A fJL 


Consequently, 


Xtc + XKe + 

average cost per unit time = - ^ ^ ^ - ■ 

Example 7.17 Consider a manufacturing process that sequentially produces items, 
each of which is either defective or acceptable. The following type of sampling scheme 
is often employed in an attempt to detect and eliminate most of the defective items. 
Initially, each item is inspected and this continues until there are k consecutive items 
that are acceptable. At this point 100% inspection ends and each successive item is 
independently inspected with probability a. This partial inspection continues until a 
defective item is encountered, at which time 100% inspection is reinstituted, and the 
process begins anew. If each item is, independently, defective with probability q, 

(a) what proportion of items are inspected? 

(b) if defective items are removed when detected, what proportion of the remaining 
items are defective? 

Remark Before starting our analysis, note that the preceding inspection scheme was 
devised for situations in which the probability of producing a defective item changed 
over time. It was hoped that 100% inspection would correlate with times at which the 
defect probability was large and partial inspection when it was small. However, it is 
still important to see how such a scheme would work in the extreme case where the 
defect probability remains constant throughout. 

Solution: We begin our analysis by noting that we can treat the preceding as a 
renewal reward process with a new cycle starting each time 100% inspection is 
instituted. We then have 

E [number inspected in a cycle] 

proportion of items inspected = - 

E [number produced in a cycle] 

Let Nk denote the number of items inspected until there are k consecutive acceptable 
items. Once partial inspection begins—that is, after Nk items have been produced— 
since each inspected item will be defective with probability q, it follows that the 
expected number that will have to be inspected to find a defective item is \/q. 
Hence, 


1 

Efnumber inspected in a cycle] = E[Nk] H— 

q 
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In addition, since at partial inspection each item produced will, independently, be 
inspected and found to be defective with probability aq, it follows that the number 
of items produced until one is inspected and found to be defective is 1 /aq, and so 

1 

£lnumber produced in a cycle] = E[N k ] H- 

aq 

Also, as E[N k \ is the expected number of trials needed to obtain k acceptable items 
in a row when each item is acceptable with probability p = 1 — q, it follows from 
Example 3.14 that 


1 1 

E[N k ] = - + -j 
P P 

Hence, we obtain 


J_ = 0 /P) k - 1 
P k C 1 


Pi = proportion of items that are inspected = - - - 

(1 / p) k -1 + 1/a 

To answer (b), note first that since each item produced is defective with probability 
q it follows that the proportion of items that are both inspected and found to be 
defective is qP\. Hence, for N large, out of the first N items produced there will 
be (approximately) NqP\ that are discovered to be defective and thus removed. As 
the first N items will contain (approximately) Nq defective items, it follows 
that there will be Nq — Nq P\ defective items not discovered. Hence, 


proportion of the nonremoved items that are defective 
As the approximation becomes exact as N —» oo, we see that 
proportion of the nonremoved items that are defective 


Nq( 1 - Pi) 
N(1 — qPi) 


g(l ~Pl) 
(1 ~qPi) 


Example 7.18 (The Average Age of a Renewal Process) Consider a renewal process 
having interarrival distribution F and define A(t) to be the time at t since the last 
renewal. If renewals represent old items failing and being replaced by new ones, then 
A(t) represents the age of the item in use at time t. Since Sjv(f) represents the time of 
the last event prior to or at time t, we have 


A(t) = t — Sn(i) 


We are interested in the average value of the age—that is, in 


lim 


/o' A(t)dt 


s—>oo s 

To determine this quantity, we use renewal reward theory in the following way: Let 
us assume that at any time we are being paid money at a rate equal to the age of the 
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renewal process at that time. That is, at time t, we are being paid at rate A(t), and so 
f^A(t)dt represents our total earnings by time s. As everything starts over again when 
a renewal occurs, it follows that 


Jq A(t)dt £[reward during a renewal cycle] 
s £[time of a renewal cycle] 

Now, since the age of the renewal process a time t into a renewal cycle is just t, we 
have 

f x 

reward during a renewal cycle = I t dt 

Jo 
_ X 2 
~ ~2 

where X is the time of the renewal cycle. Hence, we have that 


fj A{t)dt 

average value of age = lim —- 

s—>oo S 

_ E[X 2 ] 

~ 2E[X] 


(7.14) 


where X is an interarrival time having distribution function F. ■ 

Example 7.19 (The Average Excess of a Renewal Process) Another quantity asso¬ 
ciated with a renewal process is Y(t), the excess or residual time at time t. Y(t ) is 
defined to equal the time from t until the next renewal and, as such, represents the 
remaining (or residual) life of the item in use at time t. The average value of the excess, 
namely, 


lim 


fo Y(t)dt 


s—>00 S 


also can be easily obtained by renewal reward theory. To do so, suppose that we are 
paid at time t at a rate equal to Y ( t ). Then our average reward per unit time will, by 
renewal reward theory, be given by 


average value of excess = lim 


fo Y(t)dt 


s—^oo s 

E [reward during a cycle] 
£[length of a cycle] 

Now, letting X denote the length of a renewal cycle, we have 


reward 


during a cycle = 


f 

Jo 


x- 


(X — t)dt 


2 
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and thus the average value of the excess is 


average value of excess = 


E[X 2 ] 
2 E[X] 


which was the same result obtained for the average value of the age of a renewal process. 


Example 7.20 Suppose that passengers arrive at a bus stop according to a Poisson 
process with rate X. Suppose also that buses arrive according to a renewal process with 
distribution function F, and that buses pick up all waiting passengers. Assuming that 
the Poisson process of people arriving and the renewal process of buses arriving are 
independent, find 

(a) the average number of people who are waiting for a bus, averaged over all time; 
and 

(b) the average amount of time that a passenger waits, averaged over all passengers. 

Solution: We will solve this by using renewal reward processes. Say that a new 
cycle begins each time a bus arrives. Let T be the time of a cycle, and note that T has 
distribution function F. If we suppose that each passenger pays us money at a rate 
of 1 per unit time while they wait for a bus, then the reward rate at any time is the 
number waiting at that time, and so the average reward per unit time is the average 
number of people that are waiting for a bus. Letting R be the reward earned during 
a cycle, the renewal reward theorem gives 


Average Number Waiting 


£[*] 

E[T] 


Let N be the number of arrivals during a cycle. To determine £[k],we will condition 
on the values of both T and N. Now, 


E[R\T = t,N = n] = nt/2 

which follows because given there are n arrivals by time t their set of arrival times 
are distributed as n independent uniform (0, t) random variables, and so the average 
amount received per passenger is t /2. Hence, 

E[R\T, N] = NT/2 

Taking expectations yields 

E[R] = l -E[NT] 

To compute E[NT], condition on T to obtain 
E[NT\T] = TE[N\T] = XT 2 
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where the preceding follows because, given the time T until the bus arrives, the 
number of people waiting is Poisson distributed with mean XT. Hence, upon taking 
expectations of the preceding, we obtain 

1 9 

£[fl] = -E[NT] = XE[T~]/2 
which gives that 


Average Number Waiting = - 

6 e 2 E[T] 

where T has the interarrival distribution F. 

To determine the average amount of time that a passenger waits note that, because 
each passenger pays 1 per unit time while waiting for a bus, the total amount paid by 
a passenger is the amount of time the passenger waits. Because R is the total reward 
earned in a cycle, it thus follows that, with W) being the waiting time of passenger i, 

r = W 1 +--- + W n 


Now, if we consider the rewards earned from successive passengers, namely Wi , 
W 2 , ■ ■ ■, and imagine that the reward IT/ is earned at time i , then this sequence of 
rewards constitutes a discrete time renewal reward process in which a new cycle 
begins at time N + 1. Consequently, from renewal reward process theory and the 
preceding identity, we see that 

Wl+ ... + Wn £[Wi + --- + Wjv] £[/f] 

ltm -= -= - 

h —>00 n £[1V] E[N] 

Using that 

E[N] = E[E[N\T]] = E[XT] = XE[T] 

along with the previously derived E [ R\ = XE[T 2 ]/2 we obtain the result 

r W\ + ■ ■ ■ + W n E[T 2 ] 
h —>00 n 2£’[7’] 

Because j is the average value of the excess for the renewal process of arriving 
buses, the preceding equation yields the interesting result that the average waiting 
time of a passenger is equal to the average time until the next bus arrives when 
we average over all time. Because passengers are arriving according to a Poisson 
process, this result is a special case of a general result, known as the PASTA principle, 
to be presented in Chapter 8. The PASTA principle says that a system as seen by 
Poisson arrivals is the same as the system as averaged over all time. (In our example, 
the system refers to the time until the next bus). ■ 










436 


Introduction to Probability Models 


7.5 Regenerative Processes 


Consider a stochastic process [X(t), t ^ 0) with state space 0, 1,2,..., having the 
property that there exist time points at which the process (probabilistically) restarts 
itself. That is, suppose that with probability 1, there exists a time 7], such that the 
continuation of the process beyond 7j is a probabilistic replica of the whole process 
starting at 0. Note that this property implies the existence of further times 72, T3, ..., 
having the same property as 7j. Such a stochastic process is known as a regenerative 
process. 

From the preceding, it follows that 7), 72 ,..., constitute the arrival times of a 
renewal process, and we shall say that a cycle is completed every time a renewal 
occurs. 

Examples 

(1) A renewal process is regenerative, and 7j represents the time of the first renewal. 

(2) A recurrent Markov chain is regenerative, and T\ represents the time of the first 
transition into the initial state. 


We are interested in determining the long-run proportion of time that a regenerative 
process spends in state j. To obtain this quantity, let us imagine that we earn a reward 
at a rate 1 per unit time when the process is in state j and at rate 0 otherwise. That is, 
if I(s) represents the rate at which we earn at time s, then 


m = 


1, if A (.s) = j 
0, if X(s)^j 


and 

total reward earned by t = / I(s) ds 

Jo 

As the preceding is clearly a renewal reward process that starts over again at the cycle 
time 7j, we see from Proposition 7.3 that 

£ [reward by time 7j] 

average reward per unit time = - 

6 F £[7i] 

However, the average reward per unit is just equal to the proportion of time that the 
process is in state j. That is, we have the following. 

Proposition 7.4 For a regenerative process, the long-run 


proportion of time in state j 


£ [amount of time in j during a cycle] 
£[time of a cycle] 


Remark If the cycle time 7) is a continuous random variable, then it can be shown 
by using an advanced theorem called the “key renewal theorem” that the preceding is 
equal also to the limiting probability that the system is in state j at time t. That is, if 
T\ is continuous, then 

£ [amount of time in j during a cycle] 

hrn P{X(t) = j} = 

r->oo 


Zsftime of a cycle] 
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Example 7.21 Consider a positive recurrent continuous-time Markov chain that is 
initially in state i. By the Markovian property, each time the process reenters state i it 
starts over again. Thus returns to state i are renewals and constitute the beginnings of 
new cycles. By Proposition 7.4, it follows that the long-run 

£ [amount of time in j during an i — i cycle] 

proportion of time in state j = - 

Pii 

where /la represents the mean time to return to state i. If we take j to equal i, then we 
obtain 


proportion of time in state i 


1/Vj 
ft (i 


Example 7.22 (A Queueing System with Renewal Arrivals) Consider a waiting 
time system in which customers arrive in accordance with an arbitrary renewal process 
and are served one at time by a single server having an arbitrary service distribution. 
If we suppose that at time 0 the initial customer has just arrived, then {X(t), t ^ 0} is 
a regenerative process, where X (?) denotes the number of customers in the system at 
time t. The process regenerates each time a customer arrives and finds the server free. 


Example 7.23 Although a system needs only a single machine to function, it maintains 
an additional machine as a backup. A machine in use functions for a random time with 
density function / and then fails. If a machine fails while the other one is in working 
condition, then the latter is put in use and, simultaneously, repair begins on the one that 
just failed. If a machine fails while the other machine is in repair, then the newly failed 
machine waits until the repair is completed; at that time the repaired machine is put in 
use and, simultaneously, repair begins on the recently failed one. All repair times have 
density function g. Find Pq, P\, /b, where P, is the long-run proportion of time that 
exactly i of the machines are in working condition. 

Solution: Let us say that the system is in state i whenever i machines are in working 
condition i = 0, 1, 2. It is then easy to see that every time the system enters state 1 
it probabilistically starts over. That is, the system restarts every time that a machine 
is put in use while, simultaneously, repair begins on the other one. Say that a cycle 
begins each time the system enters state 1. If we let X denote the working time of 
the machine put in use at the beginning of a cycle, and let R be the repair time of the 
other machine, then the length of the cycle, call it T c , can be expressed as 


T c = max ( X , R) 


The preceding follows when X ^ R, because, in this case, the machine in use fails 
before the other one has been repaired, and so a new cycle begins when that repair 
is completed. Similarly, it follows when R < X, because then the repair occurs first, 
and so a new cycle begins when the machine in use fails. Also, let 7), i = 0,1, 2, 
be the amount of time that the system is in state i during a cycle. Then, because the 
amount of time during a cycle that neither machine is working is R — X provided 
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that this quantity is positive or 0 otherwise, we have 
To = (R - X)+ 

Similarly, because the amount of time during the cycle that a single machine is 
working is min (X, R), we have 

7i = min (X, R) 

Finally, because the amount of time during the cycle that both machines are working 
is X — R if this quantity is positive or 0 otherwise, we have 


T 2 = (X- R)+ 


Hence, we obtain 

p = E[(R - X) + ] 

0 £[max ( X , R)] 

p _ fflmin (X, R)] 

1 £[max (X, /?)] 

p = E[(X - R)+] 

£[max ( X , R)] 

That Pq + Pi + P 2 = 1 follows from the easily checked identity 
max(x, r) = min(x, r) + (x — r) + + (r — x) + 

The preceding expectations can be computed as follows: 


£[max (Z, Z)] = 


E[(R - X)+] = 


£[min (X, R)] = 


E[(X-R)+] = 


POO PC 

Jo Jo 


max (x, r)f(x)g(r)dx dr 


poo pr poo poo 

/ / rf(x)g(r)dx dr + / / xf(x)g(r)dx dr 

JO JO JO Jr 


POO PX 

Jo Jo 


(r — xy f(x)g(r)dx dr 


JO 
poo pr 

/ / (r - x)f(x)g(r)dxdr 

Jo Jo 

<* OO POO 


POO PC 

Jo Jo 


min (x, r)f(x)g(r)dx dr 


Jo 
f* oo pr 


poo pr poo poo 

/ / xf(x)g(r)dx dr + / / rf(x)g(r)dx dr 

JO JO JO Jr 


POO PX 

Jo Jo 


(x - r )/ (x)g(r)dr dx 
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7.5.1 Alternating Renewal Processes 

Another example of a regenerative process is provided by what is known as an alter¬ 
nating renewal process, which considers a system that can be in one of two states: on 
or off. Initially it is on, and it remains on for a time Z \; it then goes off and remains off 
for a time Y\. It then goes on for a time Z2; then off for a time I2; then on, and so on. 

We suppose that the random vectors (Z„, Y„), n X 1 are independent and identically 
distributed. That is, both the sequence of random variables [Z„j and the sequence { Y „} 
are independent and identically distributed; but we allow Z„ and Y„ to be dependent. 
In other words, each time the process goes on, everything starts over again, but when 
it then goes off, we allow the length of the off time to depend on the previous on time. 

Let E[Z] = E[Z„] and E[Y] = E[ Y n ] denote, respectively, the mean lengths of an 
on and off period. 

We are concerned with P on , the long-run proportion of time that the system is on. 
If we let 


X n — Y n + Z n , n Js I 


then at time X \ the process starts over again. That is, the process starts over again after 
a complete cycle consisting of an on and an off interval. In other words, a renewal 
occurs whenever a cycle is completed. Therefore, we obtain from Proposition 7.4 that 


E[Z] 

E[Y] + E[Z] 
£[on] 

Elon] + £[off] 


(7.15) 


Also, if we let P 0 ff denote the long-run proportion of time that the system is off, then 


/Lit = 1 - P on 

£[off] 

£[on] + £[off] 


(7.16) 


Example 7.24 (A Production Process) One example of an alternating renewal pro¬ 
cess is a production process (or a machine) that works for a time Z1, then breaks down 
and has to be repaired (which takes a time Y\), then works for a time Z2, then is down 
for a time Y 2 , and so on. If we suppose that the process is as good as new after each 
repair, then this constitutes an alternating renewal process. It is worthwhile to note that 
in this example it makes sense to suppose that the repair time will depend on the amount 
of time the process had been working before breaking down. ■ 

Example 7.25 The rate a certain insurance company charges its policyholders alter¬ 
nates between r\ and r 0. A new policyholder is initially charged at a rate of r\ per unit 
time. When a policyholder paying at rate r\ has made no claims for the most recent s 
time units, then the rate charged becomes ro per unit time. The rate charged remains at 
ro until a claim is made, at which time it reverts to r 1. Suppose that a given policyholder 
lives forever and makes claims at times chosen according to a Poisson process with 
rate X, and find 
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(a) Pj, the proportion of time that the policyholder pays at rate r,, i = 0, 1; 

(b) the long-run average amount paid per unit time. 

Solution: If we say that the system is “on” when the policyholder pays at rate r\ 
and “off” when she pays at rate ro, then this on-off system is an alternating renewal 
process with a new cycle starting each time a claim is made. If X is the time between 
successive claims, then the on time in the cycle is the smaller of s and X. (Note that 
if X < s, then the off time in the cycle is 0.) Since X is exponential with rate X, the 
preceding yields 

£[on time in cycle] = £[min (X, 5)] 


r 

Jo 

1 


xXe dx + se ^ s 


= ^{\-e~ ks ) 

A 


Since E[X] = l/X, we see that 

£[on time in cycle] 


Pi = 


E[X] 


= 1 — e 


—ks 


and 


P 0 = 1 - Pi = e 


—ks 


The long-run average amount paid per unit time is 


roPo + r\Pi=r\- {r\ - r 0 )e ks ■ 

Example 7.26 (The Age of a Renewal Process) Suppose we are interested in deter¬ 
mining the proportion of time that the age of a renewal process is less than some 
constant c. To do so, let a cycle correspond to a renewal, and say that the system is 
“on” at time t if the age at t is less than or equal to c, and say it is “off” if the age at t 
is greater than c. In other words, the system is “on” the first c time units of a renewal 
interval, and “off” the remaining time. Hence, letting X denote a renewal interval, we 
have, from Equation (7.15), 

£ [min (X, c)] 


proportion of time age is less than c = 


E[X ] 


/ Q °° P{min (X, c ) > x}dx 
E[X] 

Jq P{X > x}dx 

E[X] 

/ 0 e (1 - F(x))dx 
E[X ] 


(7.17) 


where F is the distribution function of X and where we have used the identity that for 
a nonnegative random variable Y 


E[Y] = 


-f 


P{Y > x}dx 











Renewal Theory and Its Applications 


441 


no 


-X-1-*■ 


t 


first renewal after t 


-A(t) - 

-X-1-X- 


7 


last renewal before t 


Figure 7.3 Arrowheads indicate direction of time. 


Example 7.27 ( The Excess of a Renewal Process) Let us now consider the long-run 
proportion of time that the excess of a renewal process is less than c. To determine 
this quantity, let a cycle correspond to a renewal interval and say that the system is on 
whenever the excess of the renewal process is greater than or equal to c and that it is 
off otherwise. In other words, whenever a renewal occurs the process goes on and stays 
on until the last c time units of the renewal interval when it goes off. Clearly this is an 
alternating renewal process, and so we obtain Equation (7.16) that 

/i [off time in cycle] 

long-run proportion of time the excess is less than c = - 

E [cycle time] 

If X is the length of a renewal interval, then since the system is off the last c time units 
of this interval, it follows that the off time in the cycle will equal min ( X , c). Thus, 


long-run proportion of time the excess is less than c 


£[min (X, c)] 
E\X] 

/o (1 - F(x))dx 
E[X] 


where the final equality follows from Equation (7.17). Thus, we see from the result of 
Example 7.23 that the long-run proportion of time that the excess is less than c and the 
long-run proportion of time that the age is less than c are equal. One way to understand 
this equivalence is to consider a renewal process that has been in operation for a long 
time and then observe it going backwards in time. In doing so, we observe a counting 
process where the times between successive events are independent random variables 
having distribution F. That is, when we observe a renewal process going backwards in 
time we again observe a renewal process having the same probability structure as the 
original. Since the excess (age) at any time for the backwards process corresponds to 
the age (excess) at that time for the original renewal process (see Figure 7.3), it follows 
that all long-run properties of the age and the excess must be equal. ■ 


Example 7.28 (The Busy Period of the M/G/oo Queue) The infinite server queue¬ 
ing system in which customers arrive according to a Poisson process with rate X, and 
have a general service distribution G, was analyzed in Section 5.3, where it was shown 
that the number of customers in the system at time t is Poisson distributed with mean 
X J Q G(y)dy. If we say that the system is busy when there is at least one customer in 
the system and is idle when the system is empty, find E[B], the expected length of a 
busy period. 
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Solution: If we say that the system is on when there is at least one customer in the 
system, and off when the system is empty, then we have an alternating renewal pro¬ 
cess. Because / 0 °° G(t)dt = E[ 5], where £[5] is the mean of the service distribution 
G, it follows from the result of Section 5.3 that 

lim P{ system off at t } = e ~ XE ^ 

I —>oo 

Consequently, from alternating renewal process theory we obtain 

_xe\s\ E[oS time in cycle] 

£ [cycle time] 

But when the system goes off, it remains off only up to the time of the next arrival, 
giving that 

£[off time in cycle] — l/X 

Because 


Zsfon time in cycle] = E[B\ 
we obtain 

e -XE[S] = 

\/X + E[B] 
or 

E[B] = -(e lElSi - l) ■ 

If /i is the mean interarrival time, then the distribution function F e , defined by 

„,, r 1 - F <y) J 

F e (x) = / - dy 

Jo B 

is called the equilibrium distribution of F. From the preceding, it follows that F e (x) 
represents the long-run proportion of time that the age, and the excess, of the renewal 
process is less than or equal to x. 

Example 7.29 (An Inventory Example) Suppose that customers arrive at a specified 
store in accordance with a renewal process having interarrival distribution F. Suppose 
that the store stocks a single type of item and that each arriving customer desires a 
random amount of this commodity, with the amounts desired by the different customers 
being independent random variables having the common distribution G. The store uses 
the following (s, S) ordering policy: If its inventory level falls below s then it orders 
enough to bring its inventory up to S. That is, if the inventory after serving a customer 
is x, then the amount ordered is 

S — x , if x < s 
0 , if x > s 

The order is assumed to be instantaneously filled. 
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For a fixed value y, s ^ y ^ S, suppose that we are interested in determining 
the long-run proportion of time that the inventory on hand is at least as large as y. 
To determine this quantity, let us say that the system is “on” whenever the inventory 
level is at least y and is “off” otherwise. With these definitions, the system will go on 
each time that a customer’s demand causes the store to place an order that results in 
its inventory level returning to S. Since whenever this occurs a customer must have 
just arrived it follows that the times until succeeding customers arrive will constitute 
a renewal process with interarrival distribution F; that is, the process will start over 
each time the system goes back on. Thus, the on and off periods so defined constitute 
an alternating renewal process, and from Equation (7.15) we have that 

E [on time in a cycle] 

long-run proportion of time inventory ^ y — - (7.18) 

E [cycle time] 

Now, if we let D\, TF,... denote the successive customer demands, and let 

N x = min (n : D\ + • • • + D n > S — x) (7.19) 

then it is the N y customer in the cycle that causes the inventory level to fall below y, 
and it is the N s customer that ends the cycle. As a result, if we let Xj, i ^ 1, denote 
the interarrival times of customers, then 

Ny 

on time in a cycle = £ X ‘ ( 7 - 20 > 

i =1 
N s 

cycle time = £*/ (7-21) 

i =1 

Assuming that the interarrival times are independent of the successive demands, we 
have that 


'Ny 



Ny 1 

E x ‘ 

i=i 

= E 

E 

\Ny 

1 = 1 


= E[N y E[X]] 
= E[X]E[N y \ 


Similarly, 


E 



= E[X]E[N S ] 


Therefore, from Equations (7.18), (7.20), and (7.21) we see that 


long-run proportion of time inventory ^ y = 


E[N y ] 
E[N S ] 


(7.22) 
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However, as the P>,-, i Js 1, are independent and identically distributed nonnegative 
random variables with distribution G, it follows from Equation (7.19) that N x has the 
same distribution as the index of the first event to occur after time S — x of a renewal 
process having interarrival distribution G. That is, N x — 1 would be the number of 
renewals by time S — x of this process. Hence, we see that 

E[Ny] = m(S - y) + 1, 

E[N S ] = m(S — s) + 1 

where 

OO 

m{t) = Y J G n {t ) 

n =1 

From Equation (7.22), we arrive at 

m(S — v) + 1 

long-run proportion of time inventory ^ v = - 1 -, s y 5) S 

m(S — s) + 1 

For instance, if the customer demands are exponentially distributed with mean 1 //i, 
then 


long-run proportion of time inventory ^ y = 


v(S-y) + 1 
/x(S — s) + 1 ’ 


s ^ y ^ S ■ 


7.6 Semi-Markov Processes 

Consider a process that can be in state 1 or state 2 or state 3. It is initially in state 1 where 

it remains for a random amount of time having mean fi \, then it goes to state 2 

where it remains for a random amount of time having mean /x2, then it goes to 
state 3 where it remains for a mean time /A3, then back to state 1, and so on. What 
proportion of time is the process in state i, i = 1, 2, 3? 

If we say that a cycle is completed each time the process returns to state 1, and if 
we let the reward be the amount of time we spend in state i during that cycle, then the 
preceding is a renewal reward process. Hence, from Proposition 7.3 we obtain that P;, 
the proportion of time that the process is in state i, is given by 

Pi = - — -, i = 1, 2, 3 

Ml + M2 + M3 

Similarly, if we had a process that could be in any of N states 1, 2,..., N and that 
moved from state 1 2 —> 3 —» /V — I —> /V —> 1, then the long-run 

proportion of time that the process spends in state i is 


Ml + M 2 + • • • + MV 


i = 1,2, ...,N 
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where /x; is the expected amount of time the process spends in state i during each visit. 

Let us now generalize the preceding to the following situation. Suppose that a process 
can be in any one of N states 1, 2,..., N, and that each time it enters state i it remains 
there for a random amount of time having mean //,■ and then makes a transition into 
state j with probability Pjj. Such a process is called a semi-Markov process. Note that 
if the amount of time that the process spends in each state before making a transition 
is identically 1, then the semi-Markov process is just a Markov chain. 

Let us calculate P, for a semi-Markov process. To do so, we first consider 7tj, the 
proportion of transitions that take the process into state i. Now, if we let X n denote 
the state of the process after the nth transition, then { X n , n ^ 0} is a Markov chain 
with transition probabilities P,y ,i,j — 1, 2, ..., N . Hence, jr; will just be the limiting 
(or stationary) probabilities for this Markov chain (Section 4.4). That is, 7T/ will be the 
unique nonnegative solution* of 

N 

i=i 

N 

m = Yl n jPji’ 1 = 1 - 2 , • ■ •, N (7.23) 

j= 1 

Now, since the process spends an expected time /i, in state i whenever it visits that 
state, it seems intuitive that P, should be a weighted average of the jr; where jr; is 
weighted proportionately to /x;. That is, 

Tti l-l j 

„ -, 1 = 1,2,-.., N (7.24) 

2 _,/=i n jPj 

where the jr, are given as the solution to Equation (7.23). 

Example 7.30 Consider a machine that can be in one of three states: good condition, 
fair condition, or broken down. Suppose that a machine in good condition will remain 
this way for a mean time \x \ and then will go to either the fair condition or the broken 
condition with respective probabilities | and j. A machine in fair condition will remain 
that way for a mean time /X2 and then will break down. A broken machine will be 
repaired, which takes a mean time and when repaired will be in good condition 
with probability | and fair condition with probability i. What proportion of time is the 
machine in each state? 

Solution: Letting the states be 1,2, 3, we have by Equation (7.23) that the jt,- satisfy 

ttl + Tt2 + JI 3 = 1 , 

2 

Tt\ = 


* We shall assume that there exists a solution of Equation (7.23). That is, we assume that all of the states 
in the Markov chain communicate. 
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3 1 

7T2 — -7T 1 + -7T 3 , 

1 

7T3 — — TT\ + 7T2 

The solution is 




Hence, from Equation (7.24) we obtain that P,, the proportion of time the machine 
is in state i, is given by 


P\ 


4/xi 

4/7-1 + 5/72 + 6/73 ’ 


p 2 


5/72 _ 

4/71 + 5/72 + 6/73 ’ 


4/71 + 5/72 + 6/73 

For instance, if / 7 i = 5, /72 = 2, /73 = 1, then the machine will be in good condition 
g of the time, in fair condition of the time, in broken condition g of the time. ■ 

Remark When the distributions of the amount of time spent in each state during a 
visit are continuous, then P, also represents the limiting (as t -* 00 ) probability that 
the process will be in state i at time t. 

Example 7.31 Consider a renewal process in which the interarrival distribution is 
discrete and is such that 


P{X = ;'} = Pi, i ^ 1 

where X represents an interarrival random variable. Let L(t) denote the length of the 
renewal interval that contains the point t (that is, if N(t) is the number of renewals 
by time t and X n the nth interarrival time, then L(t) = W-vpj+i). If we think of each 
renewal as corresponding to the failure of a lightbulb (which is then replaced at the 
beginning of the next period by a new bulb), then L(t) will equal i if the bulb in use at 
time t dies in its ;th period of use. 

It is easy to see that L(t) is a semi-Markov process. To determine the proportion of 
time that L(t) = j, note that each time a transition occurs—that is, each time a renewal 
occurs—the next state will be j with probability pj. That is, the transition probabilities 
of the embedded Markov chain are Pjj = pj. Hence, the limiting probabilities of this 
embedded chain are given by 


*j = Pj 

and, since the mean time the semi-Markov process spends in state j before a transition 
occurs is j, it follows that the long-run proportion of time the state is j is 
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7.7 The Inspection Paradox 

Suppose that a piece of equipment, say, a battery, is installed and serves until it breaks 
down. Upon failure it is instantly replaced by a like battery, and this process continues 
without interruption. Letting Nit) denote the number of batteries that have failed by 
time t, we have that {/V (t). t f 0[ is a renewal process. 

Suppose further that the distribution F of the lifetime of a battery is not known and 
is to be estimated by the following sampling inspection scheme. We fix some time t and 
observe the total lifetime of the battery that is in use at time t. Since F is the distribution 
of the lifetime for all batteries, it seems reasonable that it should be the distribution for 
this battery. However, this is the inspection paradox for it turns out that the battery in 
use at time t tends to have a larger lifetime than an ordinary battery. 

To understand the preceding so-called paradox, we reason as follows. In renewal 
theoretic terms what we are interested in is the length of the renewal interval containing 
the point t. That is, we are interested in X^(t)+\ = SV(r)+i — S)v(r) ( see Figure 7.2). 
To calculate the distribution of Xv(r)+t we condition on the time of the last renewal 
prior to (or at) time t. That is, 

P{XN(t)+l > x} = E[P{Xn (; f )+i > x\SN(t) = * — $}] 

where we recall (Figure 7.2) that Sjv(r) is the time of the last renewal prior to (or at) t. 
Since there are no renewals between t — s and t, it follows that Xjv(r)+i must be larger 
than x if s > x. That is, 


^{^iVtO+t > *1^(0 = * — s} = 1 ifs>* 


(7.25) 


On the other hand, suppose that s ^ x. As before, we know that a renewal occurred at 
time t — s and no additional renewals occurred between t — s and t, and we ask for the 
probability that no renewals occur for an additional time x — s. That is, we are asking 
for the probability that an interarrival time will be greater than x given that it is greater 
than 5. Therefore, for s f x, 

P{XN(t)+l > x\Sff(f) = t — s] 

= P{ interarrival time > x|interarrival time > 5 } 

= P {interarrival time > x}/P {interarrival time > .v} 


1 - F(x) 
1 - F(s) 


> 1 - F(x) 


(7.26) 


Hence, from Equations (7.25) and (7.26) we see that, for all s, 


P{X N ( f )+ i > x|Siv (f ) = t-s}^ 1 - F(x) 


Taking expectations on both sides yields 


P{X N{t)+ i > x) ^ 1 - F(x) 


(7.27) 
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Figure 7.4 

However, 1 — F(x ) is the probability that an ordinary renewal interval is larger than 
x, that is, 1 — F(x) = P{X n > a'}, and thus Equation (7.27) is a statement of the 
inspection paradox that the length of the renewal interval containing the point t tends 
to be larger than an ordinary renewal interval. 

Remark To obtain an intuitive feel for the so-called inspection paradox, reason as 
follows. We think of the whole line being covered by renewal intervals, one of which 
covers the point r. Is it not more likely that a larger interval, as opposed to a shorter 
interval, covers the point r? 

We can explicitly calculate the distribution of X n(i)+ l when the renewal process is a 
Poisson process. (Note that, in the general case, we did not need to calculate explicitly 
P{X N{t)+1 > x} to show that it was at least as large as 1 — Fix).) To do so we write 

XiVOT+i = m + 7(0 

where A(t) denotes the time from t since the last renewal, and 7(r) denotes the time 
from t until the next renewal (see Figure 7.4). A(t) is the age of the process at time t 
(in our example it would be the age at time t of the battery in use at time t), and 7 ( t) is 
the excess life of the process at time t (it is the additional time from t until the battery 
fails). Of course, it is true that A(t) = t — Sn(i), and 7( t ) = S)v(r)+i — t • 

To calculate the distribution of X/v(/)+i we first note the important fact that, for a 
Poisson process, A(t ) and 7 (t) are independent. This follows since by the memoryless 
property of the Poisson process, the time from t until the next renewal will be exponen¬ 
tially distributed and will be independent of all that has previously occurred (including, 

t ^ 0} is a Poisson process with 

(7.28) 


if x ^ t 
if .v > t 


(7.29) 

Hence, by the independence of Y(t ) and A (t) the distribution of 7,y(;) + i is just the 
convolution of the exponential distribution seen in Equation (7.28) and the distribution 


in particular, A(t)). In fact, this shows that if {N(t), 
rate X, then 

P{Y(t) < jc} = 1 - e _:u 

The distribution of A(t) may be obtained as follows 

P{0 renewals in [t — x, f]}, 
0 , 

e~ Xx , if x < t 
0, if x > t 


\ — e KX , x ^ t 
1, x > t 


P{A(t) >x} = 

or, equivalently, 

P{A(t) <x} = 
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of Equation (7.29). It is interesting to note that for t large, A(t) approximately has an 
exponential distribution. Thus, for t large, .X)v(f)-i-i has the distribution of the convo¬ 
lution of two identically distributed exponential random variables, which by Section 
5.2.3 is the gamma distribution with parameters (2, X). In particular, for t large, the 
expected length of the renewal interval containing the point t is approximately twice 
the expected length of an ordinary renewal interval. 

Using the results obtained in Examples 7.16 and 7.17 concerning the average values 
of the age and of the excess, it follows from the identity 


Xnw+ 1 = m + Y(t) 


that the average length of the renewal interval containing a specified point is 

/o *tf(f)+i E[X 2 ] 

1 i m —--=- 

s-toO s E[X] 

where X has the interarrival distribution. Because, except for when X is a constant, 
E[X 2 ] > {E[X]) 2 , this average value is, as expected from the inspection paradox, 
greater than the expected value of an ordinary renewal interval. 

We can use an alternating renewal process argument to determine the long-run 
proportion of time that X^( r ) + i is greater than c. To do so, let a cycle correspond to a 
renewal interval, and say that the system is on at time t if the renewal interval containing 
t is of length greater than c (that is, if Wvfo+I > c), and say that the system is off at 
time t otherwise. In other words, the system is always on during a cycle if the cycle 
time exceeds c or is always off during the cycle if the cycle time is less than c. Thus, 
if X is the cycle time, we have 


on time in cycle 


X, if X > c 
0, if X ^ c 


Therefore, we obtain from alternating renewal process theory that 


Zs[on time in cycle] 

long-run proportion of time .Xjyro+i > c = - 

£[cycle time] 

_ / c °° xfix) dx 


where / is the density function of an interarrival. 


7.8 Computing the Renewal Function 

The difficulty with attempting to use the identity 

OO 

m(t) = ^2 E n (t ) 
n =1 
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to compute the renewal function is that the determination of F n (t) = P{X\ + ■ ■ ■ + 
X n ^ r} requires the computation of an «-dimensional integral. Following, we present 
an effective algorithm that requires as inputs only one-dimensional integrals. 

Let Y be an exponential random variable having rate X, and suppose that Y is 
independent of the renewal process {N(t), t ^ 0}. We start by determining E[N (Y)], 
the expected number of renewals by the random time Y. To do so, we first condition 
onli, the time of the first renewal. This yields 


£[1V(T)] = 



Is[./V(F)|Xi = x]f{x) dx 


(7.30) 


where / is the interarrival density. To determine Zs[fV(T)|Xi=x], we now condition 
on whether or not Y exceeds x. Now, if Y < x, then as the first renewal occurs at time 
x, it follows that the number of renewals by time Y is equal to 0. On the other hand, if 
we are given that x < Y, then the number of renewals by time Y will equal 1 (the one 
at x) plus the number of additional renewals between x and Y. But by the memoryless 
property of exponential random variables, it follows that, given that Y > x, the amount 
by which it exceeds x is also exponential with rate X, and so given that Y > x the 
number of renewals between x and Y will have the same distribution as N( Y). Hence, 


£’[N(T)|X 1 =x, Y <x] = 0, 
£'[iV(F)|X 1 = x, Y > x] = 1 + £[N(F)] 

and so. 


E|W(F)|Xi =x]= E[fV(F)|Xi = x, Y < x]P{Y < x\Xi = x] 
+£[N(T)|X! = x, Y > x]P[Y > x|Xi = x] 
= E[N(Y)\Xi = x, Y > x]P{Y > x} 
since Y and X i are independent 
= (1 + E[N(Y)])e~ Xx 

Substituting this into Equation (7.30) gives 

POO 

E[N(Y)] = (1 + E[N(Y)]) / e~ Xx f{x)dx 

Jo 


or 


E[e~ xx ] 

E[NiY)] = (7-31) 

where X has the renewal interarrival distribution. 

If we let X = 1/f, then Equation (7.31) presents an expression for the expected 
number of renewals (not by time t , but) by a random exponentially distributed time with 
mean t. However, as such a random variable need not be close to its mean (its variance 
is t 2 ), Equation (7.31) need not be particularly close to in(t). To obtain an accurate 
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approximation suppose that Y\, Y 2 ,..., Y n are independent exponentials with rate X 
and suppose they are also independent of the renewal process. Let, for r = 1, ..., n, 

m r = £[iV(Li + • • • + Y r )] 


To compute an expression for m r , we again start by conditioning on X 1 , the time of 
the first renewal: 


m r 



E[N(Y 1 + • • • + Y r )\Xi = x]f (x) dx 


(7.32) 


To determine the foregoing conditional expectation, we now condition on the number 
of partial sums Yj, j = 1 ....,/■, that are less than x. Now, if all r partial sums 
are less than x — that is, if Y^i=i < x —then clearly the number of renewals by time 
Xw=i ' s 0 - the other hand, given that k,k < r, of these partial sums are less than 
x, it follows from the lack of memory property of the exponential that the number of 
renewals by time h; will have the same distribution as 1 plus N (Yt+ \ + ■ ■ ■ + Y r ). 
Hence, 


E 


N{Y\ + • • • + Y r ) 


j 

X\ = x, k of the sums Yj are less than x 

i=t 


0 , if k = r 

1 + m r -k, if k < r 


(7.33) 


To determine the distribution of the number of the partial sums that are less than x, 
note that the successive values of these partial sums ^/ =1 Yj, j = 1,... ,r, have the 
same distribution as the first r event times of a Poisson process with rate X (since each 
successive partial sum is the previous sum plus an independent exponential with rate X). 
Hence, it follows that, for k < r, 


k of the partial sums 


j 

Yj are less than x | X 1 

i =1 



e-' AX {Xx) k 

k\ 


(7.34) 


Upon substitution of Equations (7.33) and (7.34) into Equation (7.32), we obtain 


fOO r | 

/S' 


n —Xx 


(1 + m r -k)~ 


(x x y 


k\ 


f(x)dx 


or, equivalently, 

Z r k Z\ (1 + m r -k)E [X k e~ xx ] (X k /k\) + E [e~ kx ] 
1 - E [e ~ xx ] 


m 


(7.35) 
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Table 7.1 Approximating m { t ) 

Fi 

7 

t 

Exact 


Approximation 


m(t) 

n = 1 

77 = 3 

n = 10 

n = 25 

n = 50 

1 

1 

0.2838 

0.3333 

0.3040 

0.2903 

0.2865 

0.2852 

1 

2 

0.7546 

0.8000 

0.7697 

0.7586 

0.7561 

0.7553 

1 

5 

2.250 

2.273 

2.253 

2.250 

2.250 

2.250 

1 

10 

4.75 

4.762 

4.751 

4.750 

4.750 

4.750 

2 

0.1 

0.1733 

0.1681 

0.1687 

0.1689 

0.1690 

— 

2 

0.3 

0.5111 

0.4964 

0.4997 

0.5010 

0.5014 

— 

2 

0.5 

0.8404 

0.8182 

0.8245 

0.8273 

0.8281 

0.8283 

2 

1 

1.6400 

1.6087 

1.6205 

1.6261 

1.6277 

1.6283 

2 

3 

4.7389 

4.7143 

4.7294 

4.7350 

4.7363 

4.7367 

2 

10 

15.5089 

15.5000 

15.5081 

15.5089 

15.5089 

15.5089 

3 

0.1 

0.2819 

0.2692 

0.2772 

0.2804 

0.2813 

— 

3 

0.3 

0.7638 

0.7105 

0.7421 

0.7567 

0.7609 

— 

3 

1 

2.0890 

2.0000 

2.0556 

2.0789 

2.0850 

2.0870 

3 

3 

5.4444 

5.4000 

5.4375 

5.4437 

5.4442 

5.4443 


If we set X = n/t, then starting with m \ given by Equation (7.31), we can use Equation 
(7.35) to recursively compute m 2 , ..., m n . The approximation of m(t ) = E[N(t)] is 
given by m n = E[N{Y\ + ■ —H Y n )]. Since Y\ + ■ —H Y„ is the sum of n independent 
exponential random variables each with mean t/n, it follows that it is (gamma) dis¬ 
tributed with mean t and variance nt 2 /n 2 — t 2 /n. Hence, by choosing n large, YTi=i 
will be a random variable having most of its probability concentrated about t, and so 
E [IV (^" =1 F,)] should be quite close to £[iV(t)]. (Indeed, if m(t ) is continuous at t, 
it can be shown that these approximations converge to m(t) as n goes to infinity.) 

Example 7.32 Table 7.1 compares the approximation with the exact value for the 
distributions with densities /,■, i = 1, 2, 3, which are given by 

/i(x) = xe~ x , 

1 - F 2 (x) = 03e~ x +0.1 e~ 2x , 

1 - F 3 (x) = 0.5e~ x + 0.5e~ 5x ■ 

7.9 Applications to Patterns 

A counting process with independent interarrival times X\, X 2 ,... is said to be a 
delayed or general renewal process if X 1 has a different distribution from the identi¬ 
cally distributed random variables X 2 , X 3 ,.... That is, a delayed renewal process is a 
renewal process in which the first interarrival time has a different distribution than the 
others. Delayed renewal processes often arise in practice and it is important to note that 
all of the limiting theorems about N( 1 ), the number of events by time t, remain valid. 
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For instance, it remains true that 

£[iV(f)] 1 , Var(JV(?)) 2/ 3 , 

- > — and- > a u as t —> oo 

t fJL t 

where /x and cr 2 are the expected value and variance of the interarrivals Xj,i > 1. 


7.9.1 Patterns of Discrete Random Variables 


Let X\, Xi, ■ ■ . be independent with P{X, = j ) = p(j), j 'P 0, and let T denote the 
first time the pattern xi, ..., x r occurs. If we say that a renewal occurs at time n,n ^ r, 
if ( X n _ r+ \ ,..., X n ) = (jc 1 ,..., x r ), then N(n), n ^ 1, is a delayed renewal process, 
where N(n ) denotes the number of renewals by time n. It follows that 

E[N(n)] 1 

n fJL 


— as n -» 00 


(7.36) 


Var(A(n)) o~ 

- > —=■ as n —>■ 00 

n /x^ 


(7.37) 


where /x and er are, respectively, the mean and standard deviation of the time between 
successive renewals. Whereas, in Section 3.6.4, we showed how to compute the expected 
value of T, we will now show how to use renewal theory results to compute both its 
mean and its variance. 

To begin, let Hi) equal 1 if there is a renewal at time i and let it be 0 otherwise, 
i ^ r. Also, let p — Wi=\P( x i)- Since, 


P{I(i) = 1} = P[Xi_ r+1 = h,...,Xi = i r ) = p 


it follows that /(*), i Js r, are Bernoulli random variables with parameter p. Now, 

n 

N(n) = J2 1 ^ 

i=r 

SO 

n 

E[N(n )] = E E U 0')] = (n-r+ 1 )p 

i=r 

Dividing by n and then letting n —> 00 gives, from Equation (7.36), 


/x = 1 ip 


(7.38) 


That is, the mean time between successive occurrences of the pattern is equal to 1 /p. 
Also, 


n —1 


Var (N(n)) 1 ^ 2 ^ ^ 

- 2 = - V Var(/(«)) + - E E Co v(/(i), I(j)) 

n n z —' n zz —' 

i=r i=r n^j>i 


n -1 


n — r 4“ 1 2 ^—\ ^\ 

-p( 1 -f) + -E E 

n n L — 4 L —' 

i=r i<j^.mm(i+r—l,n) 


Cov(/(i), /(;)) 
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where the final equality used the fact that I(i) and I (j) are independent, and thus 
have zero covariance, when |; — j\ ^ r. Letting n —> oo, and using the fact that 
Cov(7(z), /(./')) depends on i and j only through | j — zj, gives 


Var (N(n)) 
n 


r -1 

p( i - p) + 2 X! Cov ^ /(r + m 

7 = 1 


Therefore, using Equations (7.37) and (7.38), we see that 


r— 1 

C7 2 = /? _2 (1 - p) + 2p- 3 ^2 Co v(/(r), /(r + /)) 
7=1 


(7.39) 


Let us now consider the amount of “overlap” in the pattern. The overlap, equal to 
the number of values at the end of one pattern that can be used as the beginning part of 
the next pattern, is said to be of size k,k > 0 , if 

k = max{; < r : (i r - j+ 1 , ■.., i r ) = (it-- ij)} 

and is of size 0 if for all k — 1,. .., r — 1, (i r -k+ 1 , ... ,i r ) 7 ^ (it . 4). Thus, for 

instance, the pattern 0, 0, 1, 1 has overlap 0, whereas 0, 0, 1, 0, 0 has overlap 2. We 
consider two cases. 

Case 1 (The Pattern Has Overlap 0) In this case, N(n),n ^ 1, is an ordinary renewal 
process and T is distributed as an interarrival time with mean // and variance a 2 . Hence, 
we have the following from Equation (7.38): 

E[T] = (i = - (7.40) 

P 

Also, since two patterns cannot occur within a distance less than r of each other, it 
follows that I{r)I(r + j ) = 0 when 1 < j < r — 1. Hence, 

Cov(/(r), I(r + j )) = -E[I{r)]E[I(r + /)] = -p 2 , if 1 < j < r - 1 

Hence, from Equation (7.39) we obtain 

Var(T) = a 2 = / 7“ 2 (1 - p) - 2p~ 3 (r - 1 )p 2 = p~ 2 - (2 r - 1) p~ l (7.41) 

Remark In cases of “rare” patterns, if the pattern hasn’t yet occurred by some time 
n, then it would seem that we would have no reason to believe that the remaining time 
would be much less than if we were just beginning from scratch. That is, it would seem 
that the distribution is approximately memoryless and would thus be approximately 
exponentially distributed. Thus, since the variance of an exponential is equal to its mean 
squared, we would expect when /i is large that Var(7’) ~ E 2 [T], and this is borne out 
by the preceding, which states that Var(7’) = E 2 [T] — (2r — 1) E[T], 
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Example 7.33 Suppose we are interested in the number of times that a fair coin needs 
to be flipped before the pattern h, h, t,h,t occurs. For this pattern, r — 5, p — and 
the overlap is 0. Hence, from Equations (7.40) and (7.41) 

E[T] = 32, Var(T) = 32 2 - 9 x 32 = 736, 


and 


Var (T)/E 2 [T] = 0.71875 

On the other hand, if p(i) — i/ 10, i = 1,2, 3, 4 and the pattern is 1, 2, 1, 4, 1, 3, 2 
then r = 7 , p = 3/625,000, and the overlap is 0. Thus, again from Equations (7.40) 
and (7.41), we see that in this case 

E[T] = 208,333.33, Var(T) = 4.34 x 10 10 , 

Var(T)/£ 2 [T] = 0.99994 ■ 

Case 2 (The Overlap Is of Size k) In this case, 

T = T iu .. Jk + T* 

where Ti u __j k is the time until the pattern i \,..., 4 appears and T*, distributed as 
an interarrival time of the renewal process, is the additional time that it takes, starting 
with i i,..., 4, to obtain the pattern i \, , i r . Because these random variables are 
independent, we have 

E[T] = IV) t .+ E[T*] 

Var(r) = Var(7i 1 ,... iit ) + Var(r*) 

Now, from Equation (7.38) 

E[T*] = p. = p~ l (7.44) 

Also, since no two renewals can occur within a distance r — k — 1 of each other, it 
follows that I(r)I(r + j) = 0 if 1 ^ j ^ r — k — 1. Therefore, from Equation (7.39) 
we see that 

VKtT*) = o'- = p-\l-p) + 2p-A E[I(r)I(r + j)] — (r 

\j=r-k 
r— 1 

= p~ 2 -(2r- l)p- l +2p~ 3 E[I(r)I(r + j )] 

j=r-k 

The quantities E[I(r)I(r + j)] in Equation (7.45) can be calculated by considering 
the particular pattern. To complete the calculation of the first two moments of T, we 
then compute the mean and variance of 7/ by repeating the same method. 



(7.42) 

(7.43) 
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Example 7.34 Suppose that we want to determine the number of flips of a fair coin 
until the pattern h, h,t, h, h occurs. For this pattern, r = 5, p = and the overlap 
parameter is k = 2. Because 

E[I(5)1 (8)] = P{h, h , t, h, h , t, h, h) = 

E[I(5)1(9)] = P{h, h, t, h, h, h, t, h,h] — 

we see from Equations (7.44) and (7.45) that 
E[T*] = 32, 

Var(r*) = (32) 2 - 9(32) + 2(32) 3 ^ = 1120 

Hence, from Equations (7.42) and (7.43) we obtain 

E[T] = E[T hh ] + 32, Var (T) = Ww(T h , h ) + 1120 

Now, consider the pattern /;. h. It has r = 2, p = and overlap parameter 1. Since, 
for this pattern, E[I(2)1 (3)] — i, we obtain, as in the preceding, that 

E[T hM ] = E[T h ] + 4, 

Var (T hJl ) = Var (T h ) + 16 - 3(4) + 2 (^) = Var(7) i ) + 20 

Finally, for the pattern h, which has r — Up— j, we see from Equations (7.40) and 
(7.41) that 

E[T h ] — 2, Var(r /i ) = 2 
Putting it all together gives 

E[T] = 38, Var(F) = 1142, Var (T)/E 2 [T] = 0.79086 ■ 

Example 7.35 Suppose that P{X n = i } = pi, and consider the pattern 0, 1, 2, 0, 1, 
3,0, 1. Then p = p^p\P 2 P 3 , r — 8, and the overlap parameter is k — 2. Since 

E[I (8)7(14)] = plp\p 2 2 pl 
£[/(8)/(15)] = 0 

we see from Equations (7.42) and (7.44) that 

E[T] = E[7b,i] + p- 1 

and from Equations (7.43) and (7.45) that 

Var(r) = Var(7oj) + p~ 2 - 15 p~ l + 2p- 1 (p 0 p l r 1 

Now, the r and p values of the pattern 0, 1 are r(0. 1) = 2, p( 0, 1) = pop \ , and this 
pattern has overlap 0. Hence, from Equations (7.40) and (7.41), 


E[T 0 p] = (popi) x , Var(7o.i) = (popi) 2 - 3(p 0 pi) 1 
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For instance, if /?, = 0.2, i = 0, 1, 2, 3 then 

£[7] = 25 + 5 8 = 390,650 
Var(7) = 625 - 75 + 5 16 + 35 x 5 8 = 1.526 x 10 11 
Var(7)/£ 2 [7] = 0.99996 ■ 

Remark It can be shown that 7 is a type of discrete random variable called new better 
than used (NBU), which loosely means that if the pattern has not yet occurred by some 
time n then the additional time until it occurs tends to be less than the time it would 
take the pattern to occur if one started all over at that point. Such a random variable is 
known to satisfy (see Proposition 9.6.1 of Ref. [4]) 

Var(7) s; E 2 [T] - E[T] ^ E 2 [T] ■ 

Now, suppose that there are s patterns, A(l), ..., A(s) and that we are interested 
in the mean time until one of these patterns occurs, as well as the probability mass 
function of the one that occurs first. Let us assume, without any loss of generality, that 
none of the patterns is contained in any of the others. (That is, we rule out such trivial 
cases as A(l) = h, h and A(2) = h, h. t.) To determine the quantities of interest, let 
T (i) denote the time until pattern AO) occurs, i = 1,..., s, and let T(i, j) denote 
the additional time, starting with the occurrence of pattern A0), until pattern A(j) 
occurs, i^j. Start by computing the expected values of these random variables. We 
have already shown how to compute E[T (;')], i = 1,..., s. To compute E[T(i, _/)], 
use the same approach, taking into account any “overlap” between the latter part of 
AO) and the beginning part of A ( j). For instance, suppose A(l) = 0, 0, 1, 2, 0, 3, and 
A(2) = 2, 0, 3, 2, 0. Then 

7(2) = 72,0,3 + 7(1,2) 

where 72 , 0,3 is the time to obtain the pattern 2, 0, 3. Hence, 

£[7(1,2)] = £[7(2)]-£[7 2 ,o, 3 ] 

= ( PipIpI ) + (P0P2)~ l ~ (727073)“’ 

So, suppose now that all of the quantities E[T (/)] and 7 (7 0. /)] have been computed. 
Let 


M — min 7O') 

i 

and let 

P(i) = P{M = T(i)}, i = 

That is, 7(;) is the probability that pattern A(i) is the first pattern to occur. Now, for 
each j we will derive an equation that E[T(J)] satisfies as follows: 

E[T(j)] = E[M] + E[T(j) - M] 

= E[M] + E[T(i, j)]P(i), j = 1.5 


(7.46) 
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where the final equality is obtained by conditioning on which pattern occurs first. But 
Equations (7.46) along with the equation 

E p ( ! ') = 1 

i=t 

constitute a set of s + 1 equations in the .v + 1 unknowns E[M], P(i), i = 1..... ,v. 
Solving them yields the desired quantities. 

Example 7.36 Suppose that we continually flip a fair coin. With ,4(1) = h, t, t, h, h 
and A(2) = h, h, t, h, t, we have 

£[7(1)] = 32 + E[T h ] = 34, 

E[T (2)] = 32, 

£[7(1, 2)] = £[7 (2)] - E[T, uh ] = 32 - (4 + E[T h ]) = 26, 

£[7(2, 1)] = £[7(1)] - E[T Kt ] = 34 - 4 = 30 

Hence, we need, solve the equations 

34 = E[M] + 307(2), 

32= E[M] + 267(1), 

1 = 7(1)+ 7(2) 

These equations are easily solved, and yield the values 
7(1) = 7(2) = E[M] = 19 

Note that although the mean time for pattern A(2) is less than that for A(l), each has 
the same chance of occurring first. ■ 

Equations (7.46) are easy to solve when there are no overlaps in any of the patterns. 
In this case, for all i ^ j 

E[T(i, j )] = £[70')] 

so Equations (7.46) reduce to 

£[7(;)] = E[M] + (1 - 7(7))£[7(7)] 


or 


7(7) = £[M]/£[7(7)] 

Summing the preceding over all j yields 
1 

E[M] = 


EU VE[t<j)]' 


(7.47) 


P(J) = 


1/ E[T (j)] 


E }=, Ve\t<j)] 

In our next example we use the preceding to reanalyze the model of Example 7.7. 


(7.48) 
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Example 7.37 Suppose that each play of a game is, independently of the outcomes 
of previous plays, won by player ; with probability pi,i = I, .... s. Suppose further 
that there are specified numbers n(l),..., n(s) such that the first player i to win n(i ) 
consecutive plays is declared the winner of the match. Find the expected number of 
plays until there is a winner, and also the probability that the winner is i, i = 1,... ,s. 

Solution: Letting A(i ), for; = 1,..., s, denote the pattern of n,- consecutive values 
of i, this problem asks for P(i), the probability that pattern A(i) occurs first, and for 
E[M], Because 

1 _ n{i) 

E[Td)] = (1 M) n(!) + (1 M-) b(, ' )_1 + • • • + 1/Pi = n Pi - 

Pi (1 - Pi) 

we obtain, from Equations (7.47) and (7.48), that 

E ' m = rMK ul (i- Pj )/(i- P " , ' l )]‘ 

= A-w/c-^ . 

r,=,K 0> (i- Pj )/(i- Pj " 0> )] 

7.9.2 The Expected Time to a Maximal Run of Distinct Values 

Let X,,i ^ 1, be independent and identically distributed random variables that are 
equally likely to take on any of the values 1,2 , ... ,m. Suppose that these random 
variables are observed sequentially, and let T denote the first time that a run of m 
consecutive values includes all the values 1 ,... ,m. That is. 


T — min{n : X n _ m+ \,.... X n are all distinct} 


To compute E[T], define a renewal process by letting the first renewal occur at time T. 
At this point start over and, without using any of the data values up to T, let the next 
renewal occur the next time a run of m consecutive values are all distinct, and so on. 
For instance, if m = 3 and the data are 


1,3, 3, 2, 1,2, 3, 2, 1,3,..., 


(7.49) 


then there are two renewals by time 10, with the renewals occurring at times 5 and 9. 
We call the sequence of m distinct values that constitutes a renewal a renewal run. 

Let us now transform the renewal process into a delayed renewal reward process by 
supposing that a reward of 1 is earned at time n , n m, if the values X n - m +i , ..., X„ 
are all distinct. That is, a reward is earned each time the previous m data values are all 
distinct. For instance, if m = 3 and the data values are as in (7.49) then unit rewards 
are earned at times 5, 7, 9, and 10. If we let Rj denote the reward earned at time ;, then 
by Proposition 7.3, 


lim 


E[EU Ri] 


E[T] 


n 


(7.50) 
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where R is the reward earned between renewal epochs. Now, with A, equal to the set 
of the first i data values of a renewal run, and B, to the set of the first i values following 
this renewal run, we have the following: 


m — 1 


E[R] = 1 + E [reward earned a time i after a renewal] 

i=t 
m—1 

= 1 + J2P{Ai = Bi] 

1 = 1 

m —1 .. 

= 1 + E-7 

✓ ytql 


i =1 
m— 1 ., 

= V- 

m l 

1=0 

Hence, since for i ^ m 
E[Ri] = P{Xi- m+l , 


, Xj are all distinct} = - 

m m 


it follows from Equation (7.50) that 

m\ _ E[R] 
m m E[T] 

Thus, from Equation (7.51) we obtain 


m —1 


m m 1 

E \T] = , \Y J i[ / mi 

ml L —' 


i =0 


(7.51) 


The preceding delayed renewal reward process approach also gives us another way 
of computing the expected time until a specified pattern appears. We illustrate by the 
following example. 

Example 7.38 Compute E[T\, the expected time until the pattern h, h, h, t, h, h, h 
appears, when a coin that comes up heads with probability p and tails with probability 
q = 1 — p is continually flipped. 

Solution: Define a renewal process by letting the first renewal occur when the 
pattern first appears, and then start over. Also, say that a reward of 1 is earned 
whenever the pattern appears. If R is the reward earned between renewal epochs, we 
have 


6 

E [ R ] = 1 + £ [reward earned i units after a renewal] 

i =1 


= 1 + 0 + 0 + 0 + p 3 q + p 3 qp + p 3 qp 2 
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Hence, since the expected reward earned at time i is £[/?,•] = p 6 q, we obtain the 
following from the renewal reward theorem: 


1 + qp 3 + qp 4 + qp 5 
E[T] 


= qp 


6 


or 


E[T] = q~ l p~ 6 + p~ 3 + p- 2 + p- 1 ■ 

7.9.3 Increasing Runs of Continuous Random Variables 

Let X[, X 2 , ■ ■ ■ be a sequence of independent and identically distributed continuous 
random variables, and let T denote the first time that there is a string of r consecutive 
increasing values. That is, 

T = min {n ^ r : X n - r+ \ < X„- r+2 < ■ ■ ■ < X n } 

To compute E[T], define a renewal process as follows. Let the first renewal occur at T . 
Then, using only the data values after T , say that the next renewal occurs when there 
is again a string of r consecutive increasing values, and continue in this fashion. For 
instance, if r — 3 and the first 15 data values are 

12, 20, 22, 28, 43, 18, 24, 33, 60, 4, 16, 8, 12, 15, 18 

then 3 renewals would have occurred by time 15, namely, at times 3, 8, and 14. If we 
let N(n) denote the number of renewals by time n, then by the elementary renewal 
theorem 

£[#(«)] ^ _J_ 

n E[T] 

To compute E[N(n)], define a stochastic process whose state at time k, call it Sk, is 
equal to the number of consecutive increasing values at time k. That is, for 1 ^ 7 O 

S k — j if Xk-j > Xk-j+i < • • • < Xk -1 < Xk 

where Xq = 00 . Note that a renewal will occur at time k if and only if Sk = ir for 
some i ^ 1. For instance, if r = 3 and 


X 5 > X 6 < Xt < X 8 < X 9 < X 10 < Xn 


then 


^6 = 1, S-j = 2, $8 = 3, S 9 — 4, S 10 = 5, Sn = 6 
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and renewals occur at times 8 and 11. Now, for k > j 

P{Sk = j } = P{X k -j > X k - J+l < < < X k } 

= P{X k -j+i < < X k -1 < X k } 

-P{Xk-j < Xk-j+i < ■■■ < X k -i < x k } 

1 1 
“ 7i “ u + i)! 
j 

0 + D! 

where the next to last equality follows since all possible orderings of the random 
variables are equally likely. 

From the preceding, we see that 

OO 00 

E r—> i r 

P{Sb—ir}= > - 

{if + 1 )! 

However, 

n 

E[N{n )] = P{a renewal occurs at time k } 

k= 1 

Because we can show that for any numbers a k , k ^ 1, for which limj^oo o k exists that 

E n 

k=l°k y 

urn - = lim ak 

n-> oo n k —mso 

we obtain from the preceding, upon using the elementary renewal theorem, 

E[T] = v^oo . ) r , n , 

L;=i ir /( ir + !) ! 


7.10 The Insurance Ruin Problem 


Suppose that claims are made to an insurance firm according to a Poisson process 
with rate X, and that the successive claim amounts V\, Yj. ■ ■ ■ are independent random 
variables having a common distribution function F with density fix). Suppose also 
that the claim amounts are independent of the claim arrival times. Thus, if we let M{t) 
be the number of claims made by time f, then i s + c total amount paid out in 

claims by time t. Supposing that the firm starts with an initial capital x and receives 
income at a constant rate c per unit time, we are interested in the probability that the 
firm’s net capital ever becomes negative; that is, we are interested in 


R(x) = P 


M(t) 


Yi > x + ct for some t ^ 0 


! = 1 
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If the firm’s capital ever becomes negative, we say that the firm is ruined; thus R(x) is 
the probability of ruin given that the firm begins with an initial capital x. 

Let p = E [ Yj ] be the mean claim amount, and let p = Xp/c. Because claims occur 
at rate X, the long-run rate at which money is paid out is X/i. (A formal argument uses 
renewal reward processes. A new cycle begins when a claim occurs; the cost for the 
cycle is the claim amount, and so the long-run average cost is //, the expected cost 
incurred in a cycle, divided by 1 /X, the mean cycle time.) Because the rate at which 
money is received is c, it is clear that R(x) — 1 when p > 1. As R(x) can be shown to 
also equal 1 when p = 1 (think of the recurrence of the symmetric random walk), we 
will suppose that p < 1 . 

To determine R(x), we start by deriving a differential equation. To begin, consider 
what can happen in the first h time units, where h is small. With probability I — Xh + 
o(/i) there will be no claims and the firm’s capital at time li will be x + ch\ with 
probability Xh + o(h) there will be exactly one claim and the firm’s capital at time h 
will be x + ch — Y \; with probability o(h ) there will be two or more claims. Therefore, 
conditioning on what happens during the first h time units yields 

R(x) = (1 — Xh)R(x + ch) + XhE[R(x + ch — Tj)] + o(h) 

Equivalently, 

R(x + ch) — R(x) = XhR(x + ch) — XhE[R(x + ch — Ti)] + o(h) 


Dividing through by ch gives 

R(x + ch) — R(x) X X 1 o(h) 

— ---— = -R(x + ch) - -E[R(x + ch - Ti )] + - — 

ch c c c h 

Letting h go to 0 yields the differential equation 


, X X 

R\x) = -R(x) - E[R(x - Ti)] 

c c 

Because R(u) = 1 when u < 0, the preceding can be written as 

X X C x X C°° 

R'(x) = -R(x) - - / R(x- y)f(y) dy - - / f(y) dy 
c c Jo c J x 

or, equivalently, 

R'(x) = -R(x) - - I R(x - y)f(y)dy - ~F{x) 
c c Jo c 


(7.52) 


where F(x) — 1 — F(x). 

We will now use the preceding equation to show that R(x) also satisfies the equation 
X r x _ Xf x - 

R(x) = R( 0) H— / R(x — y)F(v) dy -/ F(y)dy, x ^ 0 (7.53) 

c Jo c Jo 

To verify Equation (7.53), we will show that differentiating both sides of it results in 
Equation (7.52). (It can be shown that both (7.52) and (7.53) have unique solutions.) 
To do so, we will need the following lemma, whose proof is given at the end of this 
section. 
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Lemma 7.5 For a function k, and a differentiable function t. 


l 


— t(x - y)k(y)dy = t(0)k(x) + 
ax 


r six - 

Jo 


y)k(y) cly 


Differentiating both sides of Equation (7.53) gives, upon using the preceding lemma, 


1 


R\x) = - 


R(0)F(x) 


■ r* 

Jo 


(x - y)F(y) dy - F(x) 


(7.54) 


Differentiation by parts [u = F(y),dv = R' (x — y) dy] shows that 


[ R\x - y)F(y) dy = -F(y)R(x - y)|g - 

Jo 

= -F(x)R( 0) + R(x) 


f 

-f 


R(x-y)f(y)dy 


R(x - y)f{y)dy 


Substituting this result back in Equation (7.54) gives Equation (7.52). Thus, we have 
established Equation (7.53). 

To obtain a more usable expression for R(x), consider a renewal process whose 
interarrival times X\, Xi, ... are distributed according to the equilibrium distribution 
of F. That is, the density function of the X, is 


fe(x) = F' e (x) = 


Fix) 

F 


Let N(t ) denote the number of renewals by time t, and let us derive an expression for 
q{x) = E[p N(x)+l ] 

Conditioning on X i gives 


q(x ) 


pOO 

= [ E iP 

Jo 


"M+'\ Xl = yl^ldy 
d 


Because, given that Xi = y, the number of renewals by time x is distributed as 
1 + N(x — y) when y ^ x, or is identically 0 when v > x, we see that 


JVW+1 |X 1 = y] = 


E[p 


Therefore, q (x) satisfies 


\pE[p N ( x y - )+1 ], if y ^ x 
|p, ify>x 


q(x) = 


L 


F (y) r0 ° 

pq(x - y)—— dy + p 


d 


F(y) 

d 


q(x -y)F(y)dy+ - 
/o c 


■/ 

Ijf Piyitly - I' F(y)dy 


= l -( 

C Jo 

X f x - X f x - 

= -/ q(x - y)F(y)dy + p -/ F(y)dy 

c Jo c J 0 
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Because 7/(0) = p, this is exactly the same equation that is satisfied by R(x), namely 
Equation (7.53). Therefore, because the solution to (7.53) is unique, we obtain the 
following. 

Proposition 7.6 

R(x) = q(x) = £[p WW+1 ] 

Example 7.39 Suppose that the firm does not start with any initial capital. Then, 
because N(0) = 0, we see that the firm’s probability of ruin is R(0) = p. ■ 

Example 7.40 If the claim distribution F is exponential with mean /x, then so is F e . 
Hence, N(x ) is Poisson with mean x//x, giving the result 

OO 

R(x) = E[p N(x)+l ] = p" +1 e~ x/ll (x/ii) n /n\ 

72=0 

OO 

= p e~ x/l1 ^2 (, px/n) n /n\ 

72=0 

= ■ 


To obtain some intuition about the ruin probability, let T be independent of the 
interarrival times X, of the renewal process having interarrival distribution F e , and let 
T have probability mass function 


P{T = n] = p"(l -p), n = 0, 1,... 

Now consider P J Xj > x J, the probability that the sum of the first T of the X, 
exceeds x. Because N(x) + 1 is the first renewal that occurs after time x, we have 


N(x) + 1 = min j n : Xj > 


i =l 


Therefore, conditioning on the number of renewals by time x gives 


l i=I J y=0 l i=l 


N(x) = j \ P{N (x) = j) 


= Y,P{ T >j + ll^VW = j}P{N(x) = j] 
7=o 

OO 

= J2 p l T >j + l}^(x) = 7'} 

7=0 


oo 


= ^p J+1 P{2V(x) = ;} 
7=0 

= e[p n ^+ 1 ' 
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Consequently, P j is equal to the ruin probability. Now, as noted in 

Example 7.36, the ruin probability of a firm starting with 0 initial capital is p. Suppose 
that the firm starts with an initial capital x, and suppose for the moment that it is allowed 
to remain in business even if its capital becomes negative. Because the probability 
that the firm’s capital ever falls below its initial starting amount x is the same as the 
probability that its capital ever becomes negative when it starts with 0, this probability 
is also p. Thus, if we say that a low occurs whenever the firm’s capital becomes lower 
than it has ever previously been, then the probability that a low ever occurs is p. Now, 
if a low does occur, then the probability that there will be another low is the probability 
that the firm’s capital will ever fall below its previous low, and clearly this is also p. 
Therefore, each new low is the final one with probability 1 — p. Consequently, the total 
number of lows that ever occur has the same distribution as T. In addition, if we let W, 
be the amount by which the z'th low is less than the low preceding it, it is easy to see 
that Wi, Wi, ■ ■ ■ are independent and identically distributed, and are also independent 
of the number of lows. Because the minimal value over all time of the firm’s capital 
(when it is allowed to remain in business even when its capital becomes negative) is 
x — j Wi, it follows that the ruin probability of a firm that starts with an initial 
capital x is 

R(x) = pW^Wi > x 

L=t 

Because 

R(x) = E [p ww+1 ] = P 

we can identify W, with X;. That is, we can conclude that each new low is lower than 
its predecessor by a random amount whose distribution is the equilibrium distribution 
of a claim amount. 

Remark Because the times between successive customer claims are independent 
exponential random variables with mean 1 /X while money is being paid to the insurance 
firm at a constant rate c, it follows that the amounts of money paid in to the insurance 
company between consecutive claims are independent exponential random variables 
with mean c/X. Thus, because ruin can only occur when a claim arises, it follows 
that the expression given in Proposition 7.6 for the ruin probability R(x) is valid for 
any model in which the amounts of money paid to the insurance firm between claims 
are independent exponential random variables with mean c/X and the amounts of the 
successive claims are independent random variables having distribution function F, 
with these two processes being independent. 

Now imagine an insurance model in which customers buy policies at arbitrary times, 
each customer pays the insurance company a fixed rate c per unit time, the time until 
a customer makes a claim is exponential with rate X, and each claim amount has 
distribution F. Consider the amount of money the insurance firm takes in between 
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claims. Specifically, suppose a claim has just occurred and let X be the amount the 
insurance company takes in before another claim arises. Note that this amount increases 
continuously in time until a claim occurs, and suppose that at the present time the amount 
t has been taken in since the last claim. Let us compute the probability that a claim 
will be made before the amount taken in increases by an additional amount /;, when h 
is small. To determine this probability, suppose that at the present time the firm has k 
customers. Because each of these k customers is paying the insurance firm at rate c, 
it follows that the additional amount taken in by the firm before the next claim occurs 
will be less than h if and only if a claim is made within the next time units. Because 
each of the k customers will register a claim at an exponential rate X, the time until 
one of them makes a claim is an exponential random variable with rate kX. Calling this 
random variable Ex\, it follows that the probability that the additional amount taken in 
is less than h is 


^(additional amount < h\k customers) = l’ ( Exx < — 

\ kc 

= 1 - e~ Ul/c 
X 

= —h + o(h ) 
c 


Thus, 


P(X < t + h\X > t) = —h + o(h) 


showing that the failure rate function of X is identically -. But this means that the 
amounts taken in between claims are exponential random variables with mean 
Because the amounts of each claim have distribution function F, we can thus con¬ 
clude that the firm’s failure probability in this insurance model is exactly the same as 
in the previously analyzed classical model. ■ 

Let us now give the proof of Lemma 7.5. 

Proof of Lemma 7.5. Let G(x) — t(x — y)k{y) dy. Then 


G(x + h) — G{x) — G(x + h) — f t(x + h — y)k(y) dy 

Jo 

+ / t(x + h — y)k(y) dy — G(x) 

Jo 

fx+h 


r 

= / t(x + h — y)k{y) dy 

J X 

+ [ [Gx + h - y) - t(x - y)]k(y) dy 
Jo 
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Dividing through by h gives 



t(x + h - y)k(y ) dy 

t(x + h — y) — t(x — y) 
h 


k(y ) dy 


Letting h —*■ 0 gives the result 


<X 


G'(x) = t( 0) k(x) + I t' (x — y) k(y ) dy 


o 


Exercises 

1. Is it true that 

(a) N(t) < n if and only if S n > tl 

(b) N(t) ^ n if and only if S n L tl 

(c) N(t) > n if and only if S n < tl 

2. Suppose that the interarrival distribution for a renewal process is Poisson dis¬ 

tributed with mean /x. That is, suppose 

UL k 

P{X n = k] = k = 0,1,... 

(a) Find the distribution of S n . 

(b) Calculate P{N(t) = n}. 

*3. Let S n denote the time of the nth event of the renewal process {/V (t), t ^ 0} 
having interarrival distribution F. 

(a) What is P(N(t ) = n\S n = y)l 

(b) Starting with 



and using that the sum of n independent exponentials with rate X has the Gamma 
(n, L) distribution, derive P(N(t) = n) when F{y) = 1 — e~ ky . 

4. Let {N\{t),t ^ 0} and {N 2 (t),t ^ 0} be independent renewal processes. Let 
N(t) = N l (t) + N 2 (t). 

(a) Are the interarrival times of {N(t), t ^ 0} independent? 

(b) Are they identically distributed? 

(c) Is t ^ 0} a renewal process? 
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5. Let U i, Uj, ... be independent uniform (0,1) random variables, and define N by 
N — min{n : U 1 + U 2 + -b U n > 1} 


What is E[AT|? 

*6. Consider a renewal process {N(t),t ^ 0} having a gamma (r, X) interarrival 
distribution. That is, the interarrival density is 


Xe- Xx (Xxy~ l 

f(x) = -, x > 0 

J (r- 1)! 

(a) Show that 

P{N(t) > n] = 

(b) Show that 


00 e~ Xt (Xty 


00 

m(t) = ^ 


i 

r 


i=r 


e~ Xt (Xty 


where [i /r] is the largest integer less than or equal to i /r. 

Hint: Use the relationship between the gamma (r, a) distribution and the sum 
of r independent exponentials with rate X to define N(t) in terms of a Poisson 
process with rate X. 

7. Mr. Smith works on a temporary basis. The mean length of each job he gets 
is three months. If the amount of time he spends between jobs is exponentially 
distributed with mean 2, then at what rate does Mr. Smith get new jobs? 

*8. A machine in use is replaced by a new machine either when it fails or when it 
reaches the age of T years. If the lifetimes of successive machines are independent 
with a common distribution F having density /, show that 

(a) the long-run rate at which machines are replaced equals 



-1 


dx + T( 1 - F(T)) 


(b) the long-run rate at which machines in use fail equals 


_ F£) _ 

fo xf(x)dx + r[l - F(T)] 

9. A worker sequentially works on jobs. Each time a job is completed, a new one is 
begun. Each job, independently, takes a random amount of time having distribu¬ 
tion F to complete. However, independently of this, shocks occur according to a 
Poisson process with rate X. Whenever a shock occurs, the worker discontinues 
working on the present job and starts a new one. In the long run, at what rate are 
jobs completed? 
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10. Consider a renewal process with mean interarrival time /i. Suppose that each 
event of this process is independently “counted” with probability p. Let Nc(t ) 
denote the number of counted events by time t, t > 0. 

(a) Is Nc(t), t 0 a renewal process? 

(b) What is lim^oo N c (t)/tl 

11. A renewal process for which the time until the initial renewal has a different 
distribution than the remaining interarrival times is called a delayed (or a general) 
renewal process. Prove that Proposition 7.1 remains valid for a delayed renewal 
process. (In general, it can be shown that all of the limit theorems for a renewal 
process remain valid for a delayed renewal process provided that the time until 
the first renewal has a finite mean.) 

12. Events occur according to a Poisson process with rate X. Any event that occurs 
within a time d of the event that immediately preceded it is called a c/-event. For 
instance, if d = 1 and events occur at times 2, 2.8, 4, 6, 6.6,..., then the events 
at times 2.8 and 6.6 would be ^-events. 

(a) At what rate do d-events occur? 

(b) What proportion of all events are ^-events? 

13. In each game played one is equally likely to either win or lose 1. Let X be your 
cumulative winnings if you use the strategy that quits playing if you win the first 
game, and plays two more games and then quits if you lose the first game. 

(a) Use Wald’s equation to determine /-’[A - ]. 

(b) Compute the probability mass function of X and use it to find E[X]. 

14. Consider the gambler’s ruin problem where on each bet the gambler either wins 1 
with probability p or loses 1 with probability 1 — p. The gambler will continue to 
play until his winnings are either N — i or —i. (That is, starting with i the gambler 
will quit when his fortune reaches either N or 0.) Let T denote the number of 
bets made before the gambler stops. Use Wald’s equation, along with the known 
probability that the gambler’s final winnings are N — i , to find E[T], 

Hint: Let X j be the gambler’s winnings on bet jj > 1- What are the possible 

values of £j =1 Xjl What is e[£j=i x j \ ? 

15. Consider a miner trapped in a room that contains three doors. Door 1 leads him to 
freedom after two days of travel; door 2 returns him to his room after a four-day 
journey; and door 3 returns him to his room after a six-day journey. Suppose at 
all times he is equally likely to choose any of the three doors, and let T denote 
the time it takes the miner to become free. 

(a) Define a sequence of independent and identically distributed random variables 
X[, X 2 ■ ■ ■ and a stopping time N such that 

N 

r = £> 

; = 1 



Renewal Theory and Its Applications 


471 


Note: You may have to imagine that the miner continues to randomly choose 
doors even after he reaches safety. 

(b) Use Wald’s equation to find E[T], 

(c) Compute E Xi\N = n and note that it is not equal to £[£]"_ j X;]. 

(d) Use part (c) for a second derivation of £’[7’]. 

16. A deck of 52 playing cards is shuffled and the cards are then turned face up one 
at a time. Let X ; equal 1 if the ith card turned over is an ace, and let it be 0 
otherwise, i = 1, ..., 52. Also, let N denote the number of cards that need be 
turned over until all four aces appear. That is, the final ace appears on the TVth 
card to be turned over. Is the equation 


N 


L/=l J 


= E[N]E[Xi] 


valid? If not, why is Wald’s equation not applicable? 

17. In Example 7.6, suppose that potential customers arrive in accordance with a 
renewal process having interarrival distribution F. Would the number of events 
by time 1 constitute a (possibly delayed) renewal process if an event corresponds 
to a customer 


(a) entering the bank? 

(b) leaving the bank? 

What if F were exponential? 

*18. Compute the renewal function when the interarrival distribution F is such that 
1 - F(t) = pe + (1 - p)e~ IJ ‘ 2t 

19. For the renewal process whose interarrival times are uniformly distributed over 
(0,1), determine the expected time from t = 1 until the next renewal. 

20. For a renewal reward process consider 

^ _ ^1 + ^2 + ••• + /?« 

" " Xi + x 2 + ■ ■ ■ + x„ 

where W n represents the average reward earned during the first n cycles. Show 
that W n —> E[R\/E[X] as n —> oo. 

21. Consider a single-server bank for which customers arrive in accordance with a 
Poisson process with rate X. If a customer will enter the bank only if the server 
is free when he arrives, and if the service time of a customer has the distribution 
G, then what proportion of time is the server busy? 

*22. J’s car buying policy is to always buy a new car, repair all breakdowns that occur 
during the first T time units of ownership, and then junk the car and buy a new 
one at the first breakdown that occurs after the car has reached age T. Suppose 
that the time until the first breakdown of a new car is exponential with rate X, and 
that each time a car is repaired the time until the next breakdown is exponential 
with rate p. 
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(a) At what rate does J buy new cars? 

(b) Supposing that a new car costs C and that a cost r is incurred for each repair, 
what is J’s long run average cost per unit time? 

23. In a serve and rally competition involving players A and B, each rally that begins 
with a serve by player A is won by player A with probability p a and is won by 
player B with probability q a = 1 — p a , whereas each rally that begins with a 
serve by player B is won by player A with probability pi, and is won by player B 
with probability qb = 1 — pb ■ The winner of the rally earns a point and becomes 
the server of the next rally. 

(a) In the long run, what proportion of points are won by A? 

(b) What proportion of points are won by A if the protocol is that the players 
alternate service? That is, if the service protocol is that A serves for the first 
point, then B for the second, then A for the third point, and so on. 

(c) Give the condition under which A wins a higher percentage of points under 
the winner serves protocol than under the alternating service protocol. 

24. Wald’s equation can also be proved by using renewal reward processes. Let N 
be a stopping time for the sequence of independent and identically distributed 
random variables A,-, i ^ 1. 

(a) Let N[ = N. Argue that the sequence of random variables Xn 1+ \,X n { + 2 , ■ ■ ■ 
is independent of Ai,..., Xn and has the same distribution as the original 
sequence A,, i Js 1. 

Now treat A^+i, A,y l+ 2,.. .as a new sequence, and define a stopping time 
N 2 for this sequence that is defined exactly as /V | is on the original sequence. 
(For instance, if N 1 = min{n: A„ > 0}, then N 2 = minjn: X^ l+n > 0}.) 
Similarly, define a stopping time A3 on the sequence A,v l+ ,v 2 +2; • • • 

that is identically defined on this sequence as N\ is on the original sequence, 
and so on. 

(b) Is the reward process in which A, is the reward earned during period i a 
renewal reward process? If so, what is the length of the successive cycles? 

(c) Derive an expression for the average reward per unit time. 

(d) Use the strong law of large numbers to derive a second expression for the 
average reward per unit time. 

(e) Conclude Wald’s equation. 

25. Suppose in Example 7.13 that the arrival process is a Poisson process and suppose 
that the policy employed is to dispatch the train every t time units. 

(a) Determine the average cost per unit time. 

(b) Show that the minimal average cost per unit time for such a policy is approx¬ 
imately c/2 plus the average cost per unit time for the best policy of the type 
considered in that example. 

26. Consider a train station to which customers arrive in accordance with a Poisson 
process having rate X. A train is summoned whenever there are N customers 
waiting in the station, but it takes K units of time for the train to arrive at the 
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station. When it arrives, it picks up all waiting customers. Assuming that the train 
station incurs a cost at a rate of nc per unit time whenever there are n customers 
present, find the long-run average cost. 

27. A machine consists of two independent components, the i th of which functions for 
an exponential time with rate X, . The machine functions as long as at least one of 
these components function. (That is, it fails when both components have failed.) 
When a machine fails, a new machine having both its components working is put 
into use. A cost K is incurred whenever a machine failure occurs; operating costs 
at rate c; per unit time are incurred whenever the machine in use has i working 
components, i = 1,2. Find the long-run average cost per unit time. 

28. In Example 7.15, what proportion of the defective items produced is discovered? 

29. Consider a single-server queueing system in which customers arrive in accordance 
with a renewal process. Each customer brings in a random amount of work, chosen 
independently according to the distribution G. The server serves one customer at 
a time. However, the server processes work at rate i per unit time whenever there 
are i customers in the system. For instance, if a customer with workload 8 enters 
service when there are three other customers waiting in line, then if no one else 
arrives that customer will spend 2 units of time in service. If another customer 
arrives after 1 unit of time, then our customer will spend a total of 1.8 units of 
time in service provided no one else arrives. 


Let W, denote the amount of time customer i spends in the system. Also, define 
E[W] by 

E[W] = lim (Wi + • • • + Wn)/n 


and so E[W] is the average amount of time a customer spends in the system. 

Let N denote the number of customers that arrive in a busy period. 

(a) Argue that 

E[W] = E[W\ + • ■ ■ + W n ]/E[N] 

Let Li denote the amount of work customer i brings into the system; and so 
the Lj, i ^ 1, are independent random variables having distribution G. 

(b) Argue that at any time t, the sum of the times spent in the system by all arrivals 
prior to t is equal to the total amount of work processed by time t. 

Hint: Consider the rate at which the server processes work. 

(c) Argue that 


N 


N 



(d) Use Wald’s equation (see Exercise 13) to conclude that 


E[W] = /z 
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where /i is the mean of the distribution G. That is, the average time that 
customers spend in the system is equal to the average work they bring to the 
system. 

*30. For a renewal process, let A(t ) be the age at time t. Prove that if /i < oo, then 
with probability 1 

A(t) 

-> 0 as t —>■ oo 

t 

31. If A{t) and Y ( t ) are, respectively, the age and the excess at time t of a renewal 
process having an interarrival distribution F, calculate 

P{Y(t) > x\A(t) = j} 

32. Determine the long-run proportion of time that X/v(/)+i < c. 

33. In Example 7.14, find the long-run proportion of time that the server is busy. 

34. An M/G/oo queueing system is cleaned at the fixed times T, 2 T, 3T, .... All 
customers in service when a cleaning begins are forced to leave early and a cost 
C i is incurred for each customer. Suppose that a cleaning takes time T /4, and 
that all customers who arrive while the system is being cleaned are lost, and a 
cost C 2 is incurred for each one. 

(a) Find the long-run average cost per unit time. 

(b) Find the long-run proportion of time the system is being cleaned. 

*35. Satellites are launched according to a Poisson process with rate X. Each satellite 
will, independently, orbit the earth for a random time having distribution F. Let 
X(t) denote the number of satellites orbiting at time t. 

(a) Determine P{X(t ) = k\. 

Hint: Relate this to the M/G/oo queue. 

(b) If at least one satellite is orbiting, then messages can be transmitted and we 
say that the system is functional. If the first satellite is orbited at time t — 0, 
determine the expected time that the system remains functional. 

Hint: Make use of part (a) when k = 0. 

36. Each of n skiers continually, and independently, climbs up and then skis down a 
particular slope. The time it takes skier i to climb up has distribution F, , and it is 
independent of her time to ski down, which has distribution H,, i = 1, ..., n. Let 
N(t) denote the total number of times members of this group have skied down 
the slope by time t. Also, let U ( t ) denote the number of skiers climbing up the 
hill at time t. 

(a) What is lim,-^ N(t)/tl 

(b) Find lim f _ j . 00 E[U (f)]. 

(c) If all Fj are exponential with rate X and all G, are exponential with rate /r, 
what is P{U(t) — k}l 
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37. There are three machines, all of which are needed for a system to work. Machine 
i functions for an exponential time with rate before it fails, i = 1, 2, 3. When 
a machine fails, the system is shut down and repair begins on the failed machine. 
The time to fix machine 1 is exponential with rate 5; the time to fix machine 2 is 
uniform on (0,4); and the time to fix machine 3 is a gamma random variable with 
parameters n = 3 and X = 2. Once a failed machine is repaired, it is as good as 
new and all machines are restarted. 

(a) What proportion of time is the system working? 

(b) What proportion of time is machine 1 being repaired? 

(c) What proportion of time is machine 2 in a state of suspended animation (that 
is, neither working nor being repaired)? 

38. A truck driver regularly drives round trips from A to B and then back to A. Each 
time he drives from A to B, he drives at a fixed speed that (in miles per hour) is 
uniformly distributed between 40 and 60; each time he drives from B to A, he 
drives at a fixed speed that is equally likely to be either 40 or 60. 

(a) In the long run, what proportion of his driving time is spent going to B? 

(b) In the long run, for what proportion of his driving time is he driving at a speed 
of 40 miles per hour? 

39. A system consists of two independent machines that each function for an expo¬ 
nential time with rate X. There is a single repairperson. If the repairperson is idle 
when a machine fails, then repair immediately begins on that machine; if the 
repairperson is busy when a machine fails, then that machine must wait until the 
other machine has been repaired. All repair times are independent with distribu¬ 
tion function G and, once repaired, a machine is as good as new. What proportion 
of time is the repairperson idle? 

40. Three marksmen take turns shooting at a target. Marksman 1 shoots until he 
misses, then marksman 2 begins shooting until he misses, then marksman 3 until 
he misses, and then back to marksman 1, and so on. Each time marksman i fires 
he hits the target, independently of the past, with probability P;, i = 1, 2, 3. 
Determine the proportion of time, in the long run, that each marksman shoots. 

41. Each time a certain machine breaks down it is replaced by a new one of the same 
type. In the long run, what percentage of time is the machine in use less than one 
year old if the life distribution of a machine is 

(a) uniformly distributed over (0, 2)? 

(b) exponentially distributed with mean 1 ? 

*42. For an interarrival distribution F having mean /i, we defined the equilibrium 
distribution of F, denoted F e , by 

F e (x) = - [\l-F(y)]dy 
M Jo 

(a) Show that if F is an exponential distribution, then F = F e . 
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(b) If for some constant c, 


Fix) 


0 . 

1 , 


x < c 
X > c 


show that F e is the uniform distribution on (0, c). That is, if interarrival times 
are identically equal to c, then the equilibrium distribution is the uniform 
distribution on the interval ( 0 , c). 

(c) The city of Berkeley, California, allows for two hours parking at all non- 
metered locations within one mile of the University of California. Parking 
officials regularly tour around, passing the same point every two hours. When 
an official encounters a car he or she marks it with chalk. If the same car is 
there on the official’s return two hours later, then a parking ticket is written. 
If you park your car in Berkeley and return after three hours, what is the 
probability you will have received a ticket? 

43. Consider a renewal process having interarrival distribution F such that 


F(x) = \e x + \e x t 2 , x > 0 


That is, interarrivals are equally likely to be exponential with mean 1 or exponen¬ 
tial with mean 2 . 

(a) Without any calculations, guess the equilibrium distribution F e . 

(b) Verify your guess in part (a). 

*44. In Example 7.20, let n denote the proportion of passengers that wait less than x 
for a bus to arrive. That is, with W, equal to the waiting time of passenger i, if 
we define 


1 , if Wi < x 
0 , if Wi > x 


then it = lim„_ i , 00 Ya=\ x i/ n - 

(a) With N equal to the number of passengers that get on the bus, use renewal 
reward process theory to argue that 

_ £[Ii + --- + I w ] 

£[AT] 

(b) With T equal to the time between successive buses, determine 
E[X l + --- + X N \T = t], 

(c) Show that E[X\ + ■ • • + VW] = kElmin ( T , x)]. 

(d) Show that 


/ 0 T P(T > t)dt 


= F e (x) 


71 — 


E[T] 
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(e) Using that F e (x) is the proportion of time that the excess of a renewal process 
with interarrival times distributed according to T is less than x, relate the 
result of (d) to the PASTA principle that “Poisson arrivals see the system as 
it averages over time”. 

45. Consider a system that can be in either state 1 or 2 or 3. Each time the system 
enters state i it remains there for a random amount of time having mean /i.; and 
then makes a transition into state j with probability Pij- Suppose 

P]2 = 1, P21 = /J 23 = 5 . Pn = 1 

(a) What proportion of transitions takes the system into state 1? 

(b) If yn 1 = 1, /i 2 = 2, /X 3 = 3, then what proportion of time does the system 
spend in each state? 

46. Consider a semi-Markov process in which the amount of time that the process 
spends in each state before making a transition into a different state is exponen¬ 
tially distributed. What kind of process is this? 

47. In a semi-Markov process, let tjj denote the conditional expected time that the 
process spends in state i given that the next state is j . 

(a) Present an equation relating /z; to the tjj. 

(b) Show that the proportion of time the process is in i and will next enter j is 
equal to P, Pijtjj/ni- 

Hint: Say that a cycle begins each time state i is entered. Imagine that you 
receive a reward at a rate of 1 per unit time whenever the process is in i and 
heading for j . What is the average reward per unit time? 

48. A taxi alternates between three different locations. Whenever it reaches location 
i, it stops and spends a random time having mean tj before obtaining another 
passenger, i = 1, 2, 3. A passenger entering the cab at location i will want to 
go to location j with probability Pjj . The time to travel from i to j is a random 
variable with mean rrijj. Suppose that t\ = 1, t 2 = 2, ti = 4, P 12 = 1, P 23 = 1, 
P 31 = | = 1 — P 32 , W 12 = 10, W 23 = 20, /7731 = 15 ,77132 = 25. Define an 
appropriate semi-Markov process and determine 

(a) the proportion of time the taxi is waiting at location i, and 

(b) the proportion of time the taxi is on the road from i to j, i, j = 1, 2, 3. 

*49. Consider a renewal process having the gamma (n,X) interarrival distribution, 
and let Y(t) denote the time from t until the next renewal. Use the theory of 
semi-Markov processes to show that 

1 " 

lim P{Y(t) < x] = - 'P Gj,x(x) 

f-»oo n —' 

i— I 


where Gj^(x) is the gamma (i, X) distribution function. 
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50. To prove Equation (7.24), define the following notation: 

Xj = time spent in state i on the / th visit to this state; 

N, (m ) = number of visits to state i in the first m transitions 

In terms of this notation, write expressions for 

(a) the amount of time during the first m transitions that the process is in state i ; 

(b) the proportion of time during the first m transitions that the process is in 
state i. 

Argue that, with probability 1, 

JV f (m) x i 

(c) E JVj(m) ->• Mi as m OO 

./ = 1 

(d) Nj{ni)/m —> jt, as m —> oo. 

(e) Combine parts (a), (b), (c), and (d) to prove Equation (7.24). 

51. In 1984 the country of Morocco in an attempt to determine the average amount 
of time that tourists spend in that country on a visit tried two different sampling 
procedures. In one, they questioned randomly chosen tourists as they were leav¬ 
ing the country; in the other, they questioned randomly chosen guests at hotels. 
(Each tourist stayed at a hotel.) The average visiting time of the 3000 tourists 
chosen from hotels was 17.8, whereas the average visiting time of the 12,321 
tourists questioned at departure was 9.0. Can you explain this discrepancy? Does 
it necessarily imply a mistake? 

52. In Example 7.20, show that if F is exponential with rate //. then 

Average Number Waiting = E(/V] 

That is, when buses arrive according to a Poisson process, the average number of 

people waiting at the stop, averaged over all time, is equal to the average number 
of passengers waiting when a bus arrives. This may seem counterintuitive because 

the number of people waiting when the bus arrives is at least as large as the number 

waiting at any time in that cycle. 

(b) Can you think of an inspection paradox type explanation for how such a result 
could be possible? 

(c) Explain how this result follows from the PASTA principle. 

53. If a coin that comes up heads with probability p is continually flipped until the 
pattern HTHTHTH appears, find the expected number of flips that land heads. 

54. Let Xj,i ^ 1, be independent random variables with pj — P{X = / j, / ; ; 1. If 
pj — y'/10, j = 1, 2, 3, 4, find the expected time and the variance of the number 
of variables that need be observed until the pattern 1, 2, 3, 1,2 occurs. 

55. A coin that comes up heads with probability 0.6 is continually flipped. Find the 
expected number of flips until either the sequence thht or the sequence ttt occurs, 
and find the probability that ttt occurs first. 
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56. Random digits, each of which is equally likely to be any of the digits 0 through 
9, are observed in sequence. 

(a) Find the expected time until a run of 10 distinct values occurs. 

(b) Find the expected time until a run of 5 distinct values occurs. 

57. Let h(x) = P{Y^J=i Xj > x} where X \, Xi, ... are independent random vari¬ 
ables having distribution function F e and T is independent of the X, and has 
probability mass function P{T = n} — p n ( 1 — p),n ^ 0. Show that h(x) 
satisfies Equation (7.53). 

Hint: Start by conditioning on whether T = 0 or T >0. 
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Queueing Theory 



8.1 Introduction 

In this chapter we will study a class of models in which customers arrive in some random 
manner at a service facility. Upon arrival they are made to wait in queue until it is their 
turn to be served. Once served they are generally assumed to leave the system. For such 
models we will be interested in determining, among other things, such quantities as 
the average number of customers in the system (or in the queue) and the average time 
a customer spends in the system (or spends waiting in the queue). 

In Section 8.2 we derive a series of basic queueing identities that are of great use 
in analyzing queueing models. We also introduce three different sets of limiting prob¬ 
abilities that correspond to what an arrival sees, what a departure sees, and what an 
outside observer would see. 

In Section 8.3 we deal with queueing systems in which all of the defining probability 
distributions are assumed to be exponential. For instance, the simplest such model is 
to assume that customers arrive in accordance with a Poisson process (and thus the 
interarrival times are exponentially distributed) and are served one at a time by a single 
server who takes an exponentially distributed length of time for each service. These 
exponential queueing models are special examples of continuous-time Markov chains 
and so can be analyzed as in Chapter 6. However, at the cost of a (very) slight amount 
of repetition we shall not assume that you are familiar with the material of Chapter 6, 
but rather we shall redevelop any needed material. Specifically we shall derive anew 
(by a heuristic argument) the formula for the limiting probabilities. 

In Section 8.4 we consider models in which customers move randomly among a 
network of servers. The model of Section 8.4.1 is an open system in which customers 
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are allowed to enter and depart the system, whereas the one studied in Section 8.4.2 is 
closed in the sense that the set of customers in the system is constant over time. 

In Section 8.5 we study the model M/G/l, which while assuming Poisson arrivals, 
allows the service distribution to be arbitrary. To analyze this model we first introduce 
in Section 8.5.1 the concept of work, and then use this concept in Section 8.5.2 to help 
analyze this system. In Section 8.5.3 we derive the average amount of time that a server 
remains busy between idle periods. 

In Section 8.6 we consider some variations of the model M/G/l. In particular in 
Section 8.6.1 we suppose that bus loads of customers arrive according to a Poisson 
process and that each bus contains a random number of customers. In Section 8.6.2 
we suppose that there are two different classes of customers—with type 1 customers 
receiving service priority over type 2. 

In Section 8.6.3 we present an M/G/l optimization example. We suppose that the 
server goes on break whenever she becomes idle, and then determine, under certain 
cost assumptions, the optimal time for her to return to service. 

In Section 8.7 we consider a model with exponential service times but where the 
interarrival times between customers is allowed to have an arbitrary distribution. We 
analyze this model by use of an appropriately defined Markov chain. We also derive 
the mean length of a busy period and of an idle period for this model. 

In Section 8.8 we consider a single-server system whose arrival process results 
from return visits of a finite number of possible sources. Assuming a general service 
distribution, we show how a Markov chain can be used to analyze this system. 

In the final section of the chapter we talk about multiserver systems. We start with 
loss systems, in which arrivals finding all servers busy are assumed to depart and as such 
are lost to the system. This leads to the famous result known as Erlang’s loss formula, 
which presents a simple formula for the number of busy servers in such a model when 
the arrival process in Poisson and the service distribution is general. We then discuss 
multiserver systems in which queues are allowed. However, except in the case where 
exponential service times are assumed, there are very few explicit formulas for these 
models. We end by presenting an approximation for the average time a customer waits 
in queue in a /.'-server model that assumes Poisson arrivals but allows for a general 
service distribution. 


8.2 Preliminaries 

In this section we will derive certain identities that are valid in the great majority of 
queueing models. 

8.2.1 Cost Equations 

Some fundamental quantities of interest for queueing models are 


L, 

the average number of customers in the system; 

l q. 

the average number of customers waiting in queue; 
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W, the average amount of time a customer spends in the system; 

Wq, the average amount of time a customer spends waiting in queue. 

A large number of interesting and useful relationships between the preceding and 
other quantities of interest can be obtained by making use of the following idea: Imagine 
that entering customers are forced to pay money (according to some rule) to the system. 
We would then have the following basic cost identity: 

average rate at which the system earns 

= X a x average amount an entering customer pays (8.1) 

where X a is defined to be average arrival rate of entering customers. That is, if N(t) 
denotes the number of customer arrivals by time t, then 

, r N(t) 

X a = lim - 

t —>oo t 

We now present a heuristic proof of Equation (8.1). 

Heuristic Proof of Equation (8.1). Let T be a fixed large number. In two different 
ways, we will compute the average amount of money the system has earned by time 
T. On one hand, this quantity approximately can be obtained by multiplying the aver¬ 
age rate at which the system earns by the length of time T. On the other hand, we 
can approximately compute it by multiplying the average amount paid by an entering 
customer by the average number of customers entering by time T (this latter factor is 
approximately X a T). Hence, both sides of Equation (8.1) when multiplied by T are 
approximately equal to the average amount earned by T. The result then follows by 
letting T —» oo.* 

By choosing appropriate cost rules, many useful formulas can be obtained as special 
cases of Equation (8.1). For instance, by supposing that each customer pays $1 per unit 
time while in the system, Equation (8.1) yields the so-called Little’s formula, 

L = X a W (8.2) 

This follows since, under this cost rule, the rate at which the system earns is just the 
number in the system, and the amount a customer pays is just equal to its time in the 
system. 

Similarly if we suppose that each customer pays $1 per unit time while in queue, 
then Equation (8.1) yields 

Lq = X a W Q (8.3) 

By supposing the cost rule that each customer pays $1 per unit time while in service 
we obtain from Equation (8.1) that the 

average number of customers in service = A. fl £[S'] (8.4) 

where E[S] is defined as the average amount of time a customer spends in service. 

* This can be made into a rigorous proof provided we assume that the queueing process is regenerative in 
the sense of Section 7.5. Most models, including all the ones in this chapter, satisfy this condition. 
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It should be emphasized that Equations (8.1) through (8.4) are valid for almost all 
queueing models regardless of the arrival process, the number of servers, or queue 
discipline. ■ 

8.2.2 Steady-State Probabilities 

Let X (t ) denote the number of customers in the system at time t and define P„,n > O.by 
P n = lim P{X{t) = n} 

r—>oo 

where we assume the preceding limit exists. In other words, P n is the limiting or long- 
run probability that there will be exactly n customers in the system. It is sometimes 
referred to as the steady-state probability of exactly n customers in the system. It also 
usually turns out that P n equals the (long-run) proportion of time that the system con¬ 
tains exactly n customers. For example, if Pq = 0.3, then in the long run, the system 
will be empty of customers for 30 percent of the time. Similarly, Pi = 0.2 would imply 
that for 20 percent of the time the system would contain exactly one customer.* 

Two other sets of limiting probabilities are { a n , n ty 0} and ( d „, n ^ 0}, where 

a n = proportion of customers that find n 
in the system when they arrive, and 

d n = proportion of customers leaving behind n 
in the system when they depart 

That is, P„ is the proportion of time during which there are n in the system; a n is 
the proportion of arrivals that find n\ and d n is the proportion of departures that leave 
behind n. That these quantities need not always be equal is illustrated by the following 
example. 

Example 8.1 Consider a queueing model in which all customers have service times 
equal to 1, and where the times between successive customers are always greater than 1 
(for instance, the interarrival times could be uniformly distributed over (1, 2)). Hence, 
as every arrival finds the system empty and every departure leaves it empty, we have 

flQ = do = 1 


However, 

1 

as the system is not always empty of customers. ■ 

It was, however, no accident that a n equaled d„ in the previous example. That arrivals 
and departures always see the same number of customers is always true as is shown in 
the next proposition. 


* A sufficient condition for the validity of the dual interpretation of P n is that the queueing process be 
regenerative. 
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Proposition 8.1 In any system in which customers arrive and depart one at a time 
the rate at which arrivals find n — the rate at which departures leave n 


and 


cifi — t/fj 

Proof. An arrival will see n in the system whenever the number in the system goes 
from n to n + 1; similarly, a departure will leave behind n whenever the number in the 
system goes from n + 1 to n. Now in any interval of time T the number of transitions 
from n to n + 1 must equal to within 1 the number from n + 1 to n. (Between any 
two transitions from n to n + 1, there must be one from n + 1 to n, and conversely.) 
Hence, the rate of transitions from n to n + 1 equals the rate from n + 1 to n; or, 
equivalently, the rate at which arrivals find n equals the rate at which departures leave 
n. Now a n , the proportion of arrivals finding n, can be expressed as 

the rate at which arrivals find n 

a n = - 

overall arrival rate 

Similarly, 

the rate at which departures leave n 

dfi — 

overall departure rate 

Thus, if the overall arrival rate is equal to the overall departure rate, then the preceding 
shows that a n = d n . On the other hand, if the overall arrival rate exceeds the overall 
departure rate, then the queue size will go to infinity, implying that a n = d n = 0. ■ 

Hence, on the average, arrivals and departures always see the same number of cus¬ 
tomers. However, as Example 8.1 illustrates, they do not, in general, see time averages. 
One important exception where they do is in the case of Poisson arrivals. 

Proposition 8.2 Poisson arrivals always see time averages. In particular, for Poisson 
arrivals, 

Pn — 

To understand why Poisson arrivals always see time averages, consider an arbitrary 
Poisson arrival. If we knew that it arrived at time t , then the conditional distribution of 
what it sees upon arrival is the same as the unconditional distribution of the system state 
at time t. For knowing that an arrival occurs at time t gives us no information about what 
occurred prior to t. (Since the Poisson process has independent increments, knowing 
that an event occurred at time t does not affect the distribution of what occurred prior 
to t.) Hence, an arrival would just see the system according to the limiting probabilities. 

Contrast the foregoing with the situation of Example 8.1 where knowing that an 
arrival occurred at time t tells us a great deal about the past; in particular it tells us that 
there have been no arrivals in (f — 1, t). Thus, in this case, we cannot conclude that the 
distribution of what an arrival at time t observes is the same as the distribution of the 
system state at time t. 
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For a second argument as to why Poisson arrivals see time averages, note that the 
total time the system is in state n by time T is (roughly) P n T. Hence, as Poisson arrivals 
always arrive at rate X no matter what the system state, it follows that the number of 
arrivals in [0, T] that find the system in state n is (roughly) X P n T. In the long run, 
therefore, the rate at which arrivals find the system in state n is X P n and, as X is the 
overall arrival rate, it follows that XP n /X = P n is the proportion of arrivals that find 
the system in state n. 

The result that Poisson arrivals see time averages is called the PASTA principle. 

Example 8.2 People arrive at a bus stop according to a Poisson process with rate X. 
Buses arrive at the stop according to a Poisson process with rate //, with each arriving 
bus picking up all the currently waiting people. Let Wq be the average amount of time 
that a person waits at the stop for a bus. Because the waiting time of each person is equal 
to the time from when they arrive until the next bus, which is exponentially distributed 
with rate n, we see that 

w Q = 1 /m 

Using Lq = X a Wq , now shows that Lq, the average number of people waiting at the 
bus stop, averaged over all time, is 

Lq = X/fl 

If we let Xj be the number of people picked up by the i th bus, then with 7) equal to the 
time between the (/ — 1 )st and the /th bus arrival. 


E\Xj\Tj] = XT, 


which follows because the number of people that arrive at the stop in any time interval is 
Poisson with a mean equal to X times the length of the interval. Because 7) is exponential 
with rate n, it follows upon taking expectations of both sides of the preceding that 

E[Xi] = XE[Ti] = X/fi 

Thus, the average number of people picked up by a bus is equal to the time average 
number of people waiting for a bus, an illustration of the PASTA principle. That is, 
because buses arrive according to a Poisson process, it follows from PASTA that the 
average number of waiting people seen by arriving buses is the same as the average 
number of people waiting when we average over all time. ■ 


8.3 Exponential Models 

8.3.1 A Single-Server Exponential Queueing System 

Suppose that customers arrive at a single-server service station in accordance with 
a Poisson process having rate X. That is, the times between successive arrivals are 
independent exponential random variables having mean i/X. Each customer, upon 
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arrival, goes directly into service if the server is free and, if not, the customer joins the 
queue. When the server finishes serving a customer, the customer leaves the system, 
and the next customer in line, if there is any, enters service. The successive service 
times are assumed to be independent exponential random variables having mean I / p. 

The preceding is called the M/M/1 queue. The two Ms refer to the fact that both 
the interarrival and the service distributions are exponential (and thus memoryless, or 
Markovian), and the 1 to the fact that there is a single server. To analyze it, we shall 
begin by determining the limiting probabilities P„, for n = 0, 1,.... To do so, think 
along the following lines. Suppose that we have an infinite number of rooms numbered 
0 , 1 , 2 ,..., and suppose that we instruct an individual to enter room n whenever there 
are n customers in the system. That is, he would be in room 2 whenever there are two 
customers in the system; and if another were to arrive, then he would leave room 2 and 
enter room 3. Similarly, if a service would take place he would leave room 2 and enter 
room 1 (as there would now be only one customer in the system). 

Now suppose that in the long run our individual is seen to have entered room 1 at 
the rate of ten times an hour. Then at what rate must he have left room 1 ? Clearly, at 
this same rate of ten times an hour. For the total number of times that he enters room 1 
must be equal to (or one greater than) the total number of times he leaves room 1. This 
sort of argument thus yields the general principle that will enable us to determine the 
state probabilities. Namely, for each n As 0, the rate at which the process enters state 
n equals the rate at which it leaves state n. Let us now determine these rates. Consider 
first state 0. When in state 0 the process can leave only by an arrival as clearly there 
cannot be a departure when the system is empty. Since the arrival rate is X and the 
proportion of time that the process is in state 0 is Po, it follows that the rate at which the 
process leaves state 0 is XPq. On the other hand, state 0 can only be reached from state 
1 via a departure. That is, if there is a single customer in the system and he completes 
service, then the system becomes empty. Since the service rate is p and the proportion 
of time that the system has exactly one customer is P\ , it follows that the rate at which 
the process enters state 0 is pP\. 

Hence, from our rate-equality principle we get our first equation. 


LP 0 = pP\ 


Now consider state 1. The process can leave this state either by an arrival (which occurs 
at rate X) or a departure (which occurs at rate //). Hence, when in state 1, the process 
will leave this state at a rate of X + p* Since the proportion of time the process is in 
state 1 is Pi, the rate at which the process leaves state 1 is (X + p) P\. On the other hand, 
state 1 can be entered either from state 0 via an arrival or from state 2 via a departure. 
Hence, the rate at which the process enters state 1 is XPq + pPi. Because the reasoning 
for other states is similar, we obtain the following set of equations: 


* If one event occurs at a rate A and another occurs at rate /;, then the total rate at which either event occurs 
is A + /i. Suppose one man earns $2 per hour and another earns $3 per hour; then together they clearly earn 
$5 per hour. 
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State Rate at which the process leaves = rate at which it enters 
0 XPq — pP\ 

n, n > (X + p)P n — XP n -1 + ptP „+1 

(8.5) 


Equations (8.5), which balance the rate at which the process enters each state with the 
rate at which it leaves that state are known as balance equations. 

In order to solve Equations (8.5), we rewrite them to obtain 


X 

Pi = - Po, 

/x 

X ( X 

Pn +1 = I’ll + ( I’ll - I’n - ! 

M V M 


n > 1 


Solving in terms of Pq yields 


Po = Po, 

X 

Pi = -Po, 

p. 

X 

P2= ~Pl 

p 

X 

P3=~P2 

p 

X 

P4= Pi 


Pn+l — 


p 

X 

p‘ 




Pi - -Po = -Pi = - Po, 






Pi 


p 

X 

P- 


Pl- -Pi = -Pl= - Po, 


p 


P2 = -Pi = 

P 


P„ - Pn-l I = P„ = 


p 


p 


Po, 

X^ n+l 


To determine Pq we use the fact that the P n must sum to 1, and thus 


1 - 5Z P ” - IZ 


n =0 


n= 0 



Po 

1 — X/p 



n ^ 1 


(8.6) 


Notice that for the preceding equations to make sense, it is necessary for X/p to be 
less than 1. For otherwise (X/p)" would be infinite and all the P n would be 0. 

Hence, we shall assume that X/p < I. Note that it is quite intuitive that there would 
be no limiting probabilities if X > p. For suppose that X > p. Since customers arrive 
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at a Poisson rate X, it follows that the expected total number of arrivals by time t is Xt. 
On the other hand, what is the expected number of customers served by time t? If 
there were always customers present, then the number of customers served would be 
a Poisson process having rate [x, since the time between successive services would be 
independent exponentials having mean l//x. Hence, the expected number of customers 
served by time t is no greater than /it: and, therefore, the expected number in the system 
at time t is at least 

Xt — )JLt — {X — /J,)t 

Now, if X > /x, then the preceding number goes to infinity as t becomes large. That 
is, X/fx > 1, the queue size increases without limit and there will be no limiting 
probabilities. Note also that the condition X/\i < 1 is equivalent to the condition that 
the mean service time be less than the mean time between successive arrivals. This is 
the general condition that must be satisfied for limited probabilities to exist in most 
single-server queueing systems. 

Remarks 

(i) In solving the balance equations for the M/M/ 1 queue, we obtained as an inter¬ 
mediate step the set of equations 

XP f i — fxPn- (-1, Yl ^ 0 

These equations could have been directly argued from the general queueing result 
(shown in Proposition 8.1) that the rate at which arrivals find n in the system— 
namely XP n —is equal to the rate at which departures leave behind n —namely, 
pPn+l- 

(ii) We can also prove that P n — (/,///)" (I — /,///) by using a queueing cost identity. 
Suppose that, for a fixed n > 0, whenever there are at least n customers in the system 
the nth oldest customer (with age measured from when the customer arrived) pays 
1 per unit time. Letting X be the steady state number of customers in the system, 
because the system earns 1 per unit time whenever X is at least n, it follows that 

average rate at which the system earns = P{X ^ n] 

Also, because a customer who finds fewer than n — 1 in the system when it arrives 
will pay 0, while an arrival who finds at least n — 1 in the system will pay 1 per 
unit time for an exponentially distributed time with rate /i, 

1 

average amount a customer pays = —P{X ^ n — 1} 

P 

Therefore, the queueing cost identity yields 


P{X > n] = (X/ij,)P{X > n - 1 }, «>0 
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Iterating this gives 

P{X ^ n] = (k/[i)P{X ^ n - 1} 
= (k/[x) 2 P{X ^ n - 2} 

= (kfn) n P{X> 0} 

= (Vm)" 


Therefore, 


P{X = n) = P{X > n} - P{X ^ n + 1} = (k/fi) n (l - k/fi) ■ 


Now let us attempt to express the quantities L. Lq , W, and Wq in terms of the 
limiting probabilities P n . Since P n is the long-run probability that the system contains 
exactly n customers, the average number of customers in the system clearly is given by 


CO 

L = YnP n 

n =0 





jJL — X 

where the last equation followed upon application of the algebraic identity 


(8.7) 


Y n * n 

n =0 


X 

(1 - x) 2 


The quantities W, Wq, and Lq now can be obtained with the help of Equations 
(8.2) and (8.3). That is, since k a = k, we have from Equation (8.7) that 



fx — k’ 

Wq = W - E[S] 
1 

= W - 

P 

k 

pip - k)' 
Lq = kW Q 


pip - k) 
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Example 8.3 Suppose that customers arrive at a Poisson rate of one per every 12 
minutes, and that the service time is exponential at a rate of one service per 8 minutes. 
What are L and W ? 

Solution: Since X = yy, /i = |, we have 
L — 2, W = 24 

Hence, the average number of customers in the system is 2, and the average time a 
customer spends in the system is 24 minutes. 

Now suppose that the arrival rate increases 20 percent to X = yy. What is the 
corresponding change in L and W? Again using Equations (8.8), we get 

L = 4, W = 40 

Hence, an increase of 20 percent in the arrival rate doubled the average number of 
customers in the system. 

To understand this better, write Equations (8.8) as 

L = Vm 

1 — X/n ’ 


1 — X/p. 

From these equations we can see that when X/p is near 1, a slight increase in X/p 
will lead to a large increase in L and W. ■ 

A Technical Remark We have used the fact that if one event occurs at an exponential 
rate X, and another independent event at an exponential rate p, then together they occur 
at an exponential rate X + p. To check this formally, let 7j be the time at which the 
first event occurs, and T 2 the time at which the second event occurs. Then 

P{Ti < f} = 1 - e ~ lt , 

P{T 2 < t} = 1 - e~i“ 

Now if we are interested in the time until either T\ or 7) occurs, then we are interested 
in T = min(7'i, T 2 ). Now, 

P{T < t] = 1 - P{T > t} 

= 1 - P\mm(T u T 2 ) > t] 

However, min(7i, T 2 ) > t if and only if both T\ and 7) are greater than f; hence, 

P{r^r}=l-P{7! >t, T 2 >t] 

= 1 — P{T\ > t}P{T 2 > t} 

= 1 - e-^e-^ 

— 1 _ e ~ 


Thus, T has an exponential distribution with rate X + fi, and we are justified in adding 
the rates. ■ 
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Given that an M/M /1 steady-state customer—that is, a customer who arrives after 
the system has been in operation a long time—spends a total of t time units in the 
system, let us determine the conditional distribution of N, the number of others that 
were present when that customer arrived. That is, letting W* be the amount of time a 
customer spends in the system, we will find P{N = n\W* = t}. Now, 


P{N = n\W* = t} 


In,W* ( W 1 t) 

ffr(t) 

P{N = n}f w *\ N (t\n) 
fw(0 


where fw*\N(t\n) is the conditional density of W* given that N = n, and fw*(t) is 
the unconditional density of W*. Now, given that N = n, the time that the customer 
spends in the system is distributed as the sum of n + 1 independent exponential random 
variables with a common rate /x, implying that the conditional distribution of W* given 
that N = n is the gamma distribution with parameters n + 1 and fi. Therefore, with 

C = 1/f w *(t), 

P{N =n\W* = t} = CP{N = 

n\ 

( LLt} n 

= CW/T)"(1 - A.//r,)/r,e _/xf ——— (by PASTA) 

n\ 


where K = C(1 — )./fi)fie 111 does not depend on n. Summing over n yields 


OO 

i = p { N = n \ r 

n =0 


t} = Kj2 

n =0 


(MT 

n\ 


Ke lt 


Thus, K = e ", showing that 

* u(M) n 

P{N = n\W = t) = e ~ Xt -—— 

n\ 

Therefore, the conditional distribution of the number seen by an arrival who spends a 
total of t time units in the system is the Poisson distribution with mean kt. 

In addition, as a by-product of our analysis, we have 

f w *(t)=\/C 

= 4(1 

A 

= (/X - 

In other words, W*, the amount of time a customer spends in the system, is an expo¬ 
nential random variable with rate fi — k. (As a check, we note that E[W*] = 1 j (fi — /,), 
which checks with Equation (8.8) since W — P[W*].) 
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Remark Another argument as to why W* is exponential with rate ji — X is as follows. 
If we let N denote the number of customers in the system as seen by an arrival, then 
this arrival will spend N + 1 service times in the system before departing. Now, 

P{N + 1 = j) = P{N = j- 1} = - X/H), j ^ 1 


In words, the number of services that have to be completed before the arrival departs 
is a geometric random variable with parameter 1 — A.//X. Therefore, after each service 
completion our customer will be the one departing with probability 1 — L//x. Thus, no 
matter how long the customer has already spent in the system, the probability he will 
depart in the next h time units is jih + o(h), the probability that a service ends in that 
time, multiplied by 1 — X/fi. That is, the customer will depart in the next h time units 
with probability (/x — X)h + o{h), which says that the hazard rate function of W* is 
the constant /x — X. But only the exponential has a constant hazard rate, and so we can 
conclude that W* is exponential with rate /x — X. 

Our next example illustrates the inspection paradox. 

Example 8.4 For an M/M/1 queue in steady state, what is the probability that the 
next arrival finds n in the system? 


Solution: Although it might initially seem, by the PASTA principle, that this 
probability should just be (L//x)"( 1 — X//i), we must be careful. Because if t is the 
current time, then the time from t until the next arrival is exponentially distributed 
with rate X, and is independent of the time from t since the last arrival, which (in 
the limit, as t goes to infinity) is also exponential with rate X. Thus, although the 
times between successive arrivals of a Poisson process are exponential with rate X, 
the time between the previous arrival before t and the first arrival after t is distributed 
as the sum of two independent exponentials. (This is an illustration of the inspection 
paradox, which results because the length of an interarrival interval that contains a 
specified time tends to be longer than an ordinary interarrival interval—see Section 
7.7.) 

Let N a denote the number found by the next arrival, and let X be the number 
currently in the system. Conditioning on X yields 


P{N a = n] 


OO 

= n\X = k}P{X = k] 

k=0 

co 

P{N a = n\X = k}(X/^) k ( 1 - A.//X) 

k= 0 
co 

P{N a = n\x = k}{X/n) k { 1 - V/x) 

k=n 

co 

P{N a = n\X = n + i}{X/n) n+i (\ - X/n) 

!'= 0 


Now, for n > 0, given there are currently n + i in the system, the next arrival 
will find n if we have i services before an arrival and then an arrival before the 
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next service completion. By the lack of memory property of exponential interarrival 
random variables, this gives 


P{N a =n\X = n + i } = 
Consequently, for n > 0, 

P{N a = n) = J2 


ii y x 


X -j- fx J X + fx 


n > 0 


' fx Y X fX\ n+i 

' - (i -Vm) 


i=0 


X fx J k -}- /x \ fx J 


= {X/IX)W-X/H)-—T 

A 11 

= (X/ix) n+ \l-X/fx) 


X + jx \X + /x 


On the other hand, the probability that the next arrival will find the system empty, 
when there are currently i in the system, is the probability that there are i services 
before the next arrival. Therefore, P{N a = 0|X = ;} = , giving 


P{N a = 0} = J2 


;=0 




X + /x 


= (1 -X/ix)J2 
1=0 

= (l + X/ix)(l-X/fx) 


x\ l 

- (1 -X/ix) 

ix J 


X 


X + fx 


As a check, note that 

oo 


X! p ( N ° =«} = (! - 


n=0 


l + X/fx + (k//r.) 


n+l 


n=l 


= d-k//x)X(W 

= 1 


i=0 


Note that = 0} is larger than Po = 1 — X/fx, showing that the next arrival 

is more likely to find an empty system than is an average arrival, and thus illustrating 
the inspection paradox that when the next customer arrives the elapsed time since 
the previous arrival is distributed as the sum of two independent exponentials with 
rate X. Also, we might expect because of the inspection paradox that E[N a ] is less 
than L, the average number of customers seen by an arrival. That this is indeed the 
case is seen from 

00 X 

E[N a ] = 'S'n(X/ix) n+ \\-X/ix) = -L < L U 
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8.3.2 A Single-Server Exponential Queueing System Having 
Finite Capacity 

In the previous model, we assumed that there was no limit on the number of customers 
that could be in the system at the same time. However, in reality there is always a finite 
system capacity N, in the sense that there can be no more than N customers in the 
system at any time. By this, we mean that if an arriving customer finds that there are 
already N customers present, then he does not enter the system. 

As before, we let P n , 0 ^ n ^ N, denote the limiting probability that there are n 
customers in the system. The rate-equality principle yields the following set of balance 
equations: 

State Rate at which the process leaves = rate at which it enters 


0 

1 ^ n ^ N — 1 
N 


XPq = pP i 

(X + p)P n = XP n -1 + pP n+1 
pP N = XP N _ i 


The argument for state 0 is exactly as before. Namely, when in state 0, the process 
will leave only via an arrival (which occurs at rate 1) and hence the rate at which the 
process leaves state 0 is XPq. On the other hand, the process can enter state 0 only from 
state 1 via a departure; hence, the rate at which the process enters state 0 is pP\. The 
equation for state n, where 1 ^ n < N, is the same as before. The equation for state 
N is different because now state N can only be left via a departure since an arriving 
customer will not enter the system when it is in state A; also, state N can now only be 
entered from state N — 1 (as there is no longer a state N + 1) via an arrival. 

We could now either solve the balance equations exactly as we did for the infinite 
capacity model, or we could save a few lines by directly using the result that the rate 
at which departures leave behind /? — 1 is equal to the rate at which arrivals find n — 1. 
Invoking this result yields 

pP n = XP n - 1 , n = 1. N 


giving 




By using the fact that ^ 



P n = 1 we obtain 



n [1-(A.//0 JV+1 1 

^0 -:- r~, - 


1 — X/p 


or 


A) = 


(1 - k/fJL) 


1 - (X/p) N +' 
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and hence from the preceding we obtain 


(A//x)"(l-V/x) 
1 - (V/0* +1 


n = 0, 


Note that in this case, there is no need to impose the condition that L//x < 1. The queue 
size is, by definition, bounded so there is no possibility of its increasing indefinitely. 
As before, L may be expressed in terms of P n to yield 


N 

L = Y2 n P n 

n =0 

(1 — X/fJ.) / X V' 

■ i-(v^+‘ n ts" w 

which after some algebra yields 

_ m + N(X/n) N+i - (N + l)(X/v) N ] 
( M - X)(l - (Vm) W+1 ) 


In deriving W, the expected amount of time a customer spends in the system, we 
must be a little careful about what we mean by a customer. Specifically, are we including 
those “customers” who arrive to find the system full and thus do not spend any time in 
the system? Or, do we just want the expected time spent in the system by a customer who 
actually entered the system? The two questions lead, of course, to different answers. In 
the first case, we have X a = X\ whereas in the second case, since the fraction of arrivals 
that actually enter the system is 1 — Pn, it follows that X a = X( 1 — Ly)■ Once it is 
clear what we mean by a customer, W can be obtained from 



Example 8.5 Suppose that it costs c/j. dollars per hour to provide service at a rate /i. 
Suppose also that we incur a gross profit of A dollars for each customer served. If the 
system has a capacity N, what service rate /x maximizes our total profit? 

Solution: To solve this, suppose that we use rate //. Let us determine the amount 
of money coming in per hour and subtract from this the amount going out each hour. 
This will give us our profit per hour, and we can choose /z so as to maximize this. 

Now, potential customers arrive at a rate X. However, a certain proportion of 
them do not join the system—namely, those who arrive when there are N customers 
already in the system. Hence, since / J y is the proportion of time that the system 
is full, it follows that entering customers arrive at a rate of X (1 — /?y ). Since each 
customer pays $A, it follows that money comes in at an hourly rate of X (1 — Ly) A 
and since it goes out at an hourly rate of c/i, it follows that our total profit per hour 
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is given by 


profit per hour = 1(1 — Pn)A — cyx 

= xa \ i- ^a-w 

L i - ( Vm )* +1 

1A[1 - (X/n) N ] 

1 - (l//x)^ +1 CM 


— Cfl 


For instance if N = 2, X = 1, A = 10, c — 1, then 


u 10[1 - (1/M) 2 ] 

profit per hour =- ■= -/x 

1 - (1/m) j 

10(/r 3 — m) 


In order to maximize profit we differentiate to obtain 


— [profit per hour] = 10 
d[L 


(2 jj? — 3/x 2 + 1) 

(M 3 -1) 2 


- 1 


The value of fi that maximizes our profit now can be obtained by equating to zero 
and solving numerically. ■ 

We say that a queueing system alternates between idle periods when there are no 
customers in the system and busy periods in which there is at least one customer in 
the system. We will end this section by determining the expected value and variance 
of the number of lost customers in a busy period, where a customer is said to be lost if 
it arrives when the system is at capacity. 

To determine the preceding quantities, let L n denote the number of lost customers 
in a busy period of a finite capacity M/M /1 queue in which an arrival finding n others 
does not join the system. To derive an expression for E[L n \ and Var(L„), suppose a 
busy period has just begun and condition on whether the next event is an arrival or a 
departure. Now, with 


I = 


0, 

1, 


if service completion occurs before next arrival 
if arrival before service completion 


note that if / = 0 then the busy period will end before the next arrival and so there will 
be no lost customers in that busy period. As a result 


E[L„\I = 0] = Var(L„|7 = 0) = 0 


Now suppose that the next arrival appears before the end of the first service time, and 
so I = 1. Then if n = 1 that arrival will be lost and it will be as if the busy period 
were just beginning anew at that point, yielding that the conditional number of lost 
customers has the same distribution as does 1 + L \. On the other hand, if n > 1 then 
at the moment of the arrival there will be two customers in the system, the one in 
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service and the “second customer” who has just arrived. Because the distribution of 
the number of lost customers in a busy period does not depend on the order in which 
customers are served, let us suppose that the “second customer” is put aside and does 
not receive any service until it is the only remaining customer. Then it is easy to see that 
the number of lost customers until that “second customer” begins service has the same 
distribution as the number of lost customers in a busy period when the system capacity 
is n — 1. Moreover, the additional number of lost customers in the busy period starting 
when service begins on the “second customer” has the distribution of the number of 
lost customers in a busy period when the system capacity is n. Consequently, given 
I — 1, L n has the distribution of the sum of two independent random variables: one of 
which is distributed as L n -\ and represents the number of lost customers before there 
is again only a single customer in the system, and the other which is distributed as L n 
and represents the additional number of lost customers from the moment when there 
is again a single customer until the busy period ends. Hence, 


E[L„\I= 1] = 


Jl + £[Lt], 
\E[L n - l ] + E[L n ], 


if n = 1 
if n > 1 


and 


Var(L„|7 = 1) = 


jVar(LO, 

j Var(L„_i) + Var(L„), 


if n — 1 
if n > 1 


Letting 

m n = E[L n ] and v n = Var (L n ) 

then, with mo = i, i>o = 0, the preceding equations can be rewritten as 

E[L n \I] — + m n ), (8.9) 

Var(L„ 1 1) = I(v n -i + v„) (8.10) 

Using that P(I = 1) = P (arrival before service) = = 1 — P(I = 0),we 

obtain upon taking expectations of both sides of Equation (8.9) that 

A 

m n = — — [m n + m„_i] 

A + 

or 

X 

m n = -m n -1 
M 


Starting with ni[ — a//i, this yields the result 
m n = (k/fi) n 
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To determine v„, we use the conditional variance formula. Using Equations (8.9) and 
(8.10) it gives 


Vn = (Vn + v n -i)E[I\ + (m n + m„-i) 2 Var(/) 

= -—■— (v„ + v n -i) + [(A//x)" + (A//x)" 1 
A + fl 

= ~—:- (Vn + V n -i) + (X/fl) 2n 2 ( -h l'j 

A + /X \ /X / 

= -—j- (Vn + V n -\ ) + (A//x) 2 " 1 

A + /x 


] 

2 


T X /X 
X. T /x X T /x 
A/X 

(A + /x) 2 


Hence, 

[lV n = XVn-l + (A. + /x)(A//x) 2 " -1 
or, with p = A.//X 


v n = pv n -1 + p 2 " 1 + p 2 " 


Therefore, 

x>i = p + p 2 , 

U2 = p 2 + 2p 3 + p 4 , 

i>3 = p 3 + 2p 4 + 2p 5 + p^, 

i>4 = p 4 + 2p 5 + 2p 6 + 2p 7 + p^ 

and, in general, 

2n-l 

V n = p" + 2 P J + P 2 " 
j=n +1 


8.3.3 Birth and Death Queueing Models 

An exponential queueing system in which the arrival rates and the departure rates 
depend on the number of customers in the system is known as a birth and death queueing 
model. Let X n denote the arrival rate and let /x„ denote the departure rate when there 
are n customers in the system. Loosely speaking, when there are n customers in the 
system then the time until the next arrival is exponential with rate X n and is independent 
of the time of the next departure, which is exponential with rate /x„. Equivalently, and 
more formally, whenever there are n customers in the system, the time until either the 
next arrival or the next departure occurs is an exponential random variable with rate 
X n + n„ and, independent of how long it takes for this occurrence, it will be an arrival 
with probability . ^ . We now give some examples of birth and death queues. 
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(a) The M/M/1 Queueing System 

Because the arrival rate is always A, and the departure rate is // when the system is 
nonempty, the M/M/1 is a birth and death model with 

A„ = A, n ^ 0 
n ^ 1 

(b) The M/M/1 Queueing System with Balking 

Consider the M/M/1 system but now suppose that a customer that finds n others 
in the system upon its arrival will only join the system with probability a n . (That is, 
with probability 1 — a n it balks at joining the system.) Then this system is a birth 
and death model with 

A„ = Xa n , o / 0 

ii n = n 3s 1 

The M/M/1 with finite capacity N is the special case where 

_ 1, if n < N 

- [0, if n^N 

(c) The M/M/k Queueing System 

Consider a k server system in which customers arrive according to a Poisson 
process with rate A. An arriving customer immediately enters service if any of 
the k servers are free. If all k servers are busy, then the arrival joins the queue. 
When a server completes a service the customer served departs the system and if 
there are any customers in queue then the one who has been waiting longest enters 
service with that server. All service times are exponential random variables with 
rate /i. Because customers are always arriving at rate A, 

A„ = A, n ^ 0 

Now, when there are n ^ k customers in the system then each customer will 
be receiving service and so the time until a departure will be the minimum of n 
independent exponentials each having rate fi, and so will be exponential with rate 
n/i. On the other hand if there are n > k in the system then only k of the n will 
be in service, and so the departure rate in this case is k/x. Hence, the M/M/k is a 
birth and death queueing model with arrival rates 

A„ = A, n ^ 0 

and departure rates 


— 


nfi, if n ^ k 
kjj,, if n ^ k 
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To analyze the general birth and death queueing model, let P n denote the long-run 
proportion of time there are n in the system. Then, either as a consequence of the 
balance equations given by 


state rate at which process leaves = rate at which process enters 
n — 0 XqPq = Ml A 

n ^ 1 (X n + jin) Pn = A^— 1 Pn — 1 M/i+lAi+1 

or by directly using the result that the rate at which arrivals find n in the system is equal 
to the rate at which departures leave behind n, we obtain 

l-n i’n = M/z + 1 Pn+\ , n ^ 0 

or, equivalently, that 

P„+i = -^—P„, 0 

M«+1 

Thus, 


Po = A), 

Xq 

A = —Po, 

Mi 

A, | A | An 

Pi = —Pi = -^Po, 

M2 M2M1 

A2 A7A1A0 

P 3 = —Pi = Po 

M3 M 3 M 2 MI 

and, in general 

AqAi•••A„ i 


Pn = 


MlM2•■ ■ Mh 

^OO 


Po, n > 1 


Using that Y2T=o A = ' shows that 


1 = Po 


i + E 


«=1 


A0A1•• • A„i 

Ml M2 ’ ' ' Mn 


Hence, 


Po = 


1 

E oo A.pA.1—A.„_i 

H=1 


and 


Pn 


M1M2-M/i 


i+Er=t 


n ^ 1 
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The necessary and sufficient conditions for the long-run probabilities to exist is that 
the denominator in the preceding is finite. That is, we need have that 


E 


A,qAi • • • hn—\ 


Ml M2 • • • Mrc 


< oo 


Example 8.6 For the M/M/k system 

= f ^7TT~’ if « ^ k 

1 tJ n£ k n-k » if n>k 

Hence, using that = (X/kn) n k k /k\ we see that 


Po = 


1 

1 + TLi (^/P) n /n\ + EZk+i &/kn) n k k /k\ ’ 


P n = Pq(X/ /r)"/«!, if n ^ k 

P n = P 0 (X/kp,) n k k /k\, if n > k 


It follows from the preceding that the condition needed for the limiting probabilities to 
exist is X < k/jL. Because kji is the service rate when all servers are busy, the preceding 
is just the intuitive condition that for limiting probabilities to exist the service rate needs 
to be larger than the arrival rate when there are many customers in the system. ■ 


Example 8.7 (M/M/1 Queue with Impatient Customers) Consider a single-server 
queue where customers arrive according to a Poisson process with rate X and where the 
service distribution is exponential with rate ji, but now suppose that each customer will 
only spend an exponential time with rate a in queue before quitting the system. Assume 
that the impatient times are independent of all else, and that a customer who enters 
service always remains until its service is completed. This system can be modeled as 
a birth and death process with birth and death rates 

X n — X, n ^ 0 

fj. n = p. + (n — l)a, n ^ 1 


Using the previously obtained limiting probabilities enables us to answer a variety 
of questions about this system. For instance, suppose we wanted to determine the 
proportion of arrivals that receive service. Calling this quantity n s , it can be obtained 
by letting X s be the average rate at which customers are served and noting that 


jt s = 


1 


To verify the preceding equation, let N a (t) and N s (t) denote, respectively, the number 
of arrivals and the number of services by time t. Then, 


N s (t) 

n s = ltm - 

t^oo N a (t ) 


lim 

f-»oo 


NAtyt 

N a (t)/t 


h 

x 
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Because the service departure rate is 0 when the system is empty and is /x when the 
system is nonempty, it follows that X s = /x(l — Pq), yielding that 

Md - Po) 

n s =--- ■ 

A 

To determine IT, the average time that a customer spends in the system, for the birth 
and death queueing system, we employ the fundamental queueing identity L = X a W. 
Because L is the average number of customers in the system, 

OO 

L = n P n 

n =0 


Also, because the arrival rate when there are n in the system is X n and the proportion 
of time in which there are n in the system is P n , we see that the average arrival rate of 
customers is 


= ^2 A " A 


n =0 


Consequently, 


W = 


T,T=0 nPn 

J2n=0 An P> 1 


Now consider a n equal to the proportion of arrivals that find n in the system. Since 
arrivals are at rate X n whenever there are n in system it follows that the rate at which 
arrivals find n is X n P n . Hence, in a large time T approximately X n P n T of the approxi¬ 
mately X a T arrivals will encounter n. Letting T go to infinity shows that the long-run 
proportion of arrivals finding n in the system is 


Xn Pn 


K 


Let us now consider the average length of a busy period, where we say that the system 
alternates between idle periods when there are no customers in the system and busy 
periods in which there is at least one customer in the system. Now, an idle period 
begins when the system is empty and ends when the next customer arrives. Because the 
arrival rate when the system is empty is /.o, it thus follows that, independent of all that 
previously occurred, the length of an idle period is exponential with rate /.()• Because a 
busy period always begins when there is one in the system and ends when the system 
is empty, it is easy to see that the lengths of successive busy periods are independent 
and identically distributed. Let Ij and B j denote, respectively, the lengths of the j ,h 
idle and the j th busy period, j ^ 1. Now, in the first Y^)=\ ifj + Bj) ti nle units the 
system will be empty for a time ^" =1 Ij. Consequently, Pq, the long-run proportion 






504 


Introduction to Probability Models 


of time in which the system is empty, can be expressed as 
Pq = long-run proportion of time empty 

h +... + /„ 


= lim - 

n—s-oo /[ + . . . + /,i + B[ + . . . + B n 


= lim 


{h + ... + /„) /r 


n->oo (/j /„) In + (Z?i + ... + B ,/n 

E[I] 


( 8 . 11 ) 


E[I] + E[B] 

where I and B represent, respectively, the lengths of an idle and of a busy period, and 
where the final equality follows from the strong law of large numbers. Hence, using 
that E[I] = 1 /A.o, we see that 

1 


Po = 


1 +XqE[B] 


or, 


E[B] = 


1 - Pi) 


( 8 . 12 ) 


AoA) 

For instance, in the M/M/1 queue, this yields £[B] = ; ( = 

Another quantity of interest is T n , the amount of time during a busy period that there 
are n in the system. To determine its mean, note that E[T n ] is the average amount of 
time there are n in the system in intervals between successive busy periods. Because 
the average time between successive busy periods is £[B] + /-’[/], it follows that 

P„ = long-run proportion of time there are n in system 
E[T n ] 


E[I] + E[B] 
E[T n ]P 0 


E[I] 


from (8.11) 


Hence, 


E[T n ] = 


P n A.] • • • A./J—i 


XqPq \l I jl 2 ■ ■ ■ P-n 
As a check, note that 


b = E t » 


n= 1 


and thus. 


1 1 - P 0 

E[B] = V E[T n ] = -V P„ = -- 

^ k 0 Po ^ A 0 Po 

n =1 n =1 


which is in agreement with (8.12). 















Queueing Theory 


505 


For the M/M/1 system, the preceding gives E[T n ] = X n ~ l /p.". 

Whereas in exponential birth and death queueing models the state of the system 
is just the number of customers in the system, there are other exponential models in 
which a more detailed state space is needed. To illustrate, we consider some examples. 

8.3.4 A Shoe Shine Shop 

Consider a shoe shine shop consisting of two chairs. Suppose that an entering customer 
first will go to chair 1. When his work is completed in chair 1, he will go either to chair 
2 if that chair is empty or else wait in chair 1 until chair 2 becomes empty. Suppose 
that a potential customer will enter this shop as long as chair 1 is empty. (Thus, for 
instance, a potential customer might enter even if there is a customer in chair 2.) 

If we suppose that potential customers arrive in accordance with a Poisson process at 
rate X, and that the service times for the two chairs are independent and have respective 
exponential rates of /i i and pi, then 

(a) what proportion of potential customers enters the system? 

(b) what is the mean number of customers in the system? 

(c) what is the average amount of time that an entering customer spends in the system? 

(d) Find :tb, equal to the fraction of entering customers that are blockers? That is, find 
the fraction of entering customers that will have to wait after completing service 
with server 1 before they can enter chair 2. 

To begin we must first decide upon an appropriate state space. It is clear that the state 
of the system must include more information than merely the number of customers in 
the system. For instance, it would not be enough to specify that there is one customer 
in the system as we would also have to know which chair he was in. Further, if we 
only know that there are two customers in the system, then we would not know if the 
man in chair 1 is still being served or if he is just waiting for the person in chair 2 
to finish. To account for these points, the following state space, consisting of the five 
states (0,0), (1,0), (0, 1), (1, 1), and (/;. 1), will be used. The states have the following 
interpretation: 

State Interpretation 

(0, 0) There are no customers in the system. 

(1,0) There is one customer in the system, and he is in chair 1. 

(0, 1) There is one customer in the system, and he is in chair 2. 

(1,1) There are two customers in the system, and both are presently being 
served. 

(b. 1) There are two customers in the system, but the customer in the first 
chair has completed his work in that chair and is waiting for the 
second chair to become free. 

It should be noted that when the system is in state ( b , 1), the person in chair 1, 
though not being served, is nevertheless “blocking” potential arrivals from entering the 
system. 

As a prelude to writing down the balance equations, it is usually worthwhile to 
make a transition diagram. This is done by first drawing a circle for each state and 
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Figure 8.1 A transition diagram. 


then drawing an arrow labeled by the rate at which the process goes from one state to 
another. The transition diagram for this model is shown in Figure 8.1. The explanation 
for the diagram is as follows: The arrow from state (0, 0) to state (1,0) that is labeled 
X means that when the process is in state (0, 0), that is, when the system is empty, then 
it goes to state (1, 0) at a rate X, that is, via an arrival. The arrow from (0, 1) to (1, 1) 
is similarly explained. 

When the process is in state (1, 0), it will go to state (0, 1) when the customer in 
chair 1 is finished and this occurs at a rate hence the arrow from (1, 0) to (0, 1) 
labeled ji \. The arrow from (1, 1) to (b, 1) is similarly explained. 

When in state ( b , 1) the process will go to state (0, 1) when the customer in chair 2 
completes his service (which occurs at rate p, 2 ); hence the arrow from (b, 1) to (0, 1) 
labeled n 2 - Also, when in state (1,1) the process will go to state (1,0) when the man in 
chair 2 finishes; hence the arrow from (1, 1) to (1, 0) labeled /i 2 - Finally, if the process 
is in state (0, 1), then it will go to state (0, 0) when the man in chair 2 completes his 
service; hence the arrow from (0, 1) to (0, 0) labeled /it. 

Because there are no other possible transitions, this completes the transition diagram. 

To write the balance equations we equate the sum of the arrows (multiplied by the 
probability of the states where they originate) coming into a state with the sum of the 
arrows (multiplied by the probability of the state) going out of that state. This gives 

State Rate that the process leaves — rate that it enters 

(0,0) XP 00 = p 2 P 0l 

( 1 , 0 ) tuPio = XPqo + ti 2 P\i 

(0,1) (X + p 2 )Po\ = p\Pio + P-iPbx 

(1.1) (mi + P-i)P\\ = AP 01 

(b, 1) p-iPbi = R-iPn 

These along with the equation 


Poo + Pio + F’oi + ^11 + Pbi — 1 
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may be solved to determine the limiting probabilities. Though it is easy to solve the 
preceding equations, the resulting solutions are quite involved and hence will not be 
explicitly presented. However, it is easy to answer our questions in terms of these lim¬ 
iting probabilities. First, since a potential customer will enter the system when the state 
is either (0, 0) or (0, 1), it follows that the proportion of customers entering the system 
is Poo + Poi • Secondly, since there is one customer in the system whenever the state is 
(0, 1) or (1, 0) and two customers in the system whenever the state is (1, 1) or ( b , 1), 
it follows that L, the average number in the system, is given by 

L = Poi + Pio + 2(Pn + Pm) 

To derive the average amount of time that an entering customer spends in the system, 
we use the relationship W = L/\ a - Since a potential customer will enter the system 
when the state is either (0, 0) or (0, 1), it follows that X a = k(Poo + Pot) and hence 

w _ Poi + Pio + 2(Pn + Pm) 

^(Pqo + Pot) 

One way to determine the proportion of entering customers that are blockers is to 
condition on the state seen by the customer. Because the state seen by an entering 
customer is either (0, 0) or (0, 1), the probability that an entering customers finds the 
system in state (0, 1) is P(01 | OOorOl) = p - . As an entering customer will be 
a blocker if he or she enters the system when the state is (0, 1) and then completes 
service at 1 before server 2 has finished its service, we see that 

Poi Ml 

TC fo — 

Poo + Poi Ml + M2 

Another way to obtain the proportion of entering customers that are blockers is to let 
ki, be the rate at which customers become blockers, and then use that the proportion 
of entering customers that are blockers is k/, /k a . Because blockers originate when the 
state is (1, 1) and a service at 1 occurs, it follows that /./, = mi Pn, and so 

_ Mil’ll 
Ub ~ UPoo + Poi) 

That the two solutions agree follows from the balance equation for state (1.1). ■ 


8.3.5 A Queueing System with Bulk Service 

In this model, we consider a single-server exponential queueing system in which the 
server is able to serve two customers at the same time. Whenever the server completes 
a service, she then serves the next two customers at the same time. However, if there is 
only one customer in line, then she serves that customer by herself. We shall assume that 
her service time is exponential at rate // whether she is serving one or two customers. 
As usual, we suppose that customers arrive at an exponential rate k. One example of 
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Figure 8.2 A transition diagram. 


such a system might be an elevator or a cable car that can take at most two passengers 
at any time. 

It would seem that the state of the system would have to tell us not only how 
many customers there are in the system, but also whether one or two are presently 
being served. However, it turns out that we can more easily solve the problem not by 
concentrating on the number of customers in the system, but rather on the number in 
queue. So let us define the state as the number of customers waiting in queue, with two 
states when there is no one in queue. That is, let us have as a state space O', 0, 1,2,..., 
with the interpretation 


State Interpretation 

O' No one in service 

0 Server busy; no one waiting 

n, n > 0 n customers waiting 

The transition diagram is shown in Figure 8.2 and the balance equations are 


State Rate at which the process leaves 
O' XPff 

0 {X + p)P 0 

n, n ^ 1 (X + p.) P n 

Now the set of equations 


rate at which it enters 
R-Po 

XP Q > + pP\ + p P 2 
XPn — 1 T pPn+2 


{X + p)P n — XP n -1 + pP n - 1 - 2 , n — 1,2,... 


(8.13) 


has a solution of the form 
P n =a n P 0 

To see this, substitute the preceding in Equation (8.13) to obtain 
(X + p)ct n P 0 = Xu n ~ l P Q + pa n+2 P 0 


(X + p)a = X + pa 2 


or 
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Solving this for a yields the following three roots: 


— 1 — \/T^( 

a = 1, a = -, 

2 

As the first two are clearly not possible, it 

VT+Wm — i 

a = - 

2 

Hence, 


—i + tt+Wm 

and a = - 


follows that 


2 


Pn=d n P 0 , 

a 

P «=I P ° 

where the bottom equation follows from the first balance equation. (We can ignore the 
second balance equation as one of these equations is always redundant.) To obtain Pq, 
we use 


Pq + Pq 1 + Pn — 1 


n=l 


Po 


1 + f + E“" 


n =1 


= i 


Pq 


1 /1 


1 — a X 


= 1 


Pq = 

and, thus 
Pn = 

Pqt = 

where 


A.(l -a) 

X + ix(\ — a) 


a n X{\ - a) 

X + /x (1 — a) ’ 

fj.( 1 — a) 

X + /x (1 — a) 


n ^ 0 


^/1 + AX/ ix — 1 


(8.14) 
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Note that for the preceding to be valid we need a < 1, or equivalently X/ fi <2, which 
is intuitive since the maximum service rate is 2/x, which must be larger than the arrival 
rate X to avoid overloading the system. 

All the relevant quantities of interest now can be determined. For instance, to deter¬ 
mine the proportion of customers that are served alone, we first note that the rate at 
which customers are served alone is XPq / + /x P \, since when the system is empty a 
customer will be served alone upon the next arrival and when there is one customer 
in queue he will be served alone upon a departure. As the rate at which customers are 
served is X, it follows that 


proportion of customers that are served alone 


XPqi + i~lP\ 

X 

u 

Po ' + X Pl 


Also, 


Lq = y]np„ 


X(l - a) XX „ 

= - > not from Equation(8.14) 

n =1 

, oo 

A a a 

= - by algebraic identity > not =- 

(l-a)[A + /x(l-a)] (1 — a) 2 


Lq 

1 

W = W 0 + 

M 

L = XW 


8.4 Network of Queues 

8.4.1 Open Systems 

Consider a two-server system in which customers arrive at a Poisson rate X at server 1. 
After being served by server 1 they then join the queue in front of server 2. We suppose 
there is infinite waiting space at both servers. Each server serves one customer at a time 
with server i taking an exponential time with rate Hi f° r a service, ; = 1,2. Such a 
system is called a tandem or sequential system (see Figure 8.3). 

To analyze this system we need to keep track of the number of customers at server 
1 and the number at server 2. So let us define the state by the pair («, m )—meaning 
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leaves 

system 


Figure 8.3 A tandem queue. 


that there are n customers at server 1 and m at server 2. The balance equations are 


State 

0,0 

n, 0; n >0 
0, m ; m > 0 
n, m; nm > 0 


Rate that the process leaves = rate that it enters 
XPo,o — P-lPo.X 

(X + p\)P„fi = H2Pn,\ + kPn-1,0 
{X -f- ft 2 ) P().w = M2^*0,m+1 "h Rl Pl,m— 1 
(A. 4* /Xi 4“ R2)Pn,m = P2Pn,m+l T~ Pi Pn+\,m— 1 
4“ XP n — 1 m 

(8.15) 


Rather than directly attempting to solve these (along with the equation m P„ m = 1) 
we shall guess at a solution and then verify that it indeed satisfies the preceding. We 
first note that the situation at server 1 is just as in an M/M/1 model. Similarly, as it 
was shown in Section 6.6 that the departure process of an M/M/1 queue is a Poisson 
process with rate X, it follows that what server 2 faces is also an M/M/1 queue. Hence, 
the probability that there are n customers at server 1 is 


P{n at server 1} = 



and, similarly. 


P{m at server 2} = 




Now, if the numbers of customers at servers 1 and 2 were independent random variables, 
then it would follow that 


Pn,m — 



(8.16) 


To verify that P n m is indeed equal to the preceding (and thus that the number of 
customers at server 1 is independent of the number at server 2), all we need do is verify 
that the preceding satisfies Equations (8.15)— this suffices since we know that the P n m 
are the unique solution of Equations (8.15). Now, for instance, if we consider the first 
equation of (8.15), we need to show that 


X 











512 


Introduction to Probability Models 


which is easily verified. We leave it as an exercise to show that the P nm , as given 
by Equation (8.16), satisfy all of the equations of (8.15), and are thus the limiting 
probabilities. 

From the preceding we see that L, the average number of customers in the system, 
is given by 


P = Y, (n + tn )P n m 

n,m 



x x 


Hi — X H2 - P 

and from this we see that the average time a customer spends in the system is 


X fi\ — X jji2 — x 

Remarks 

(i) The result (Equations (8.15)) could have been obtained as a direct consequence 
of the time reversibility of an M/M/ 1 (see Section 6.6). For not only does time 
reversibility imply that the output from server 1 is a Poisson process, but it also 
implies (Exercise 26 of Chapter 6) that the number of customers at server 1 is 
independent of the past departure times from server 1. As these past departure 
times constitute the arrival process to server 2, the independence of the numbers of 
customers in the two systems follows. 

(ii) Since a Poisson arrival sees time averages, it follows that in a tandem queue the 
numbers of customers an arrival (to server 1) sees at the two servers are independent 
random variables. However, it should be noted that this does not imply that the 
waiting times of a given customer at the two servers are independent. For a counter 
example suppose that X is very small with respect to hi — H 2 , and thus almost all 
customers have zero wait in queue at both servers. However, given that the wait in 
queue of a customer at server 1 is positive, his wait in queue at server 2 also will 
be positive with probability at least as large as ^ (why?). Hence, the waiting times 
in queue are not independent. Remarkably enough, however, it turns out that the 
total times (that is, service time plus wait in queue) that an arrival spends at the two 
servers are indeed independent random variables. 

The preceding result can be substantially generalized. To do so, consider a system 
of k servers. Customers arrive from outside the system to server i,i = 1,..., k, in 
accordance with independent Poisson processes at rate r,; they then join the queue at i 
until their turn at service comes. Once a customer is served by server i, he then joins the 
queue in front of server j, j = 1, ..., k, with probability Pjj. Hence, ^ 1* 

and 1 — l Pij represents the probability that a customer departs the system after 
being served by server i. 
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If we let kj denote the total arrival rate of customers to server j, then the A ; can be 
obtained as the solution of 

k 

).j = rj + X i p ij . * = 1.* (8.17) 

i'=l 


Equation (8.17) follows since rj is the arnval rate of customers to j coming from outside 
the system and, as A,- is the rate at which customers depart server i (rate in must equal 
rate out), A,- Pjj is the arrival rate to j of those coming from server i. 

It turns out that the number of customers at each of the servers is independent and 
of the form 


P{n customers at server j) 



n ^ 1 


where /i j is the exponential service rate at server j and the A j are the solution to 
Equation (8.17). Of course, it is necessary that A j/p-j < 1 for all j. To prove this, we first 
note that it is equivalent to asserting that the limiting probabilities Pin i, nj, ..., nk) = 
P{n j at server j, j = 1, ..., k] are given by 


f(ni,»2....,n*) = n( r -rf 1 -- 

7=1 V My / V Hj 


(8.18) 


which can be verified by showing that it satisfies the balance equations for this model. 
The average number of customers in the system is 


L — average number at server j 

7=1 

k A- 

_ a 7 

Pi ~ x i 


The average time a customer spends in the system can be obtained from L — XW with 
A = Y^j= l r j ■ (Why not A = Y^j=i This yields 


W = 






i 'j 


Remark The result embodied in Equation (8.18) is rather remarkable in that it says 
that the distribution of the number of customers at server i is the same as in an M/M/1 
system with rates A, and //,. What is remarkable is that in the network model the arrival 
process at node i need not be a Poisson process. For if there is a possibility that a 
customer may visit a server more than once (a situation called feedback), then the arrival 
process will not be Poisson. An easy example illustrating this is to suppose that there is a 
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single server whose service rate is very large with respect to the arrival rate from outside. 
Suppose also that with probability p — 0.9 a customer upon completion of service is 
fed back into the system. Hence, at an arrival time epoch there is a large probability of 
another arrival in a short time (namely, the feedback arrival); whereas at an arbitrary 
time point there will be only a very slight chance of an arrival occurring shortly (since 
k is so very small). Hence, the arrival process does not possess independent increments 
and so cannot be Poisson. 

Thus, we see that when feedback is allowed the steady-state probabilities of the 
number of customers at any given station have the same distribution as in an M/M/1 
model even though the model is not M/M/1. (Presumably such quantities as the joint 
distribution of the number at the station at two different time points will not be the 
same as for an M/M/1.) 

Example 8.8 Consider a system of two servers where customers from outside the 
system arrive at server 1 at a Poisson rate 4 and at server 2 at a Poisson rate 5. The 
service rates of 1 and 2 are respectively 8 and 10. A customer upon completion of 
service at server 1 is equally likely to go to server 2 or to leave the system (i.e., 
l'\ i =0, P \2 = 5 ); whereas a departure from server 2 will go 25 percent of the time 
to server 1 and will depart the system otherwise (i.e., P 21 = P 22 = 0). Determine 

the limiting probabilities, L, and W. 

Solution: The total arrival rates to servers 1 and 2—call them k 1 and >, 2 —can be 

obtained from Equation (8.17). That is, we have 

ki — 4 + 5 A. 2 , 

7-2 = 5 + jki 

implying that 

M = 6, k 2 = 8 


Hence, 


/'Xxn 1 /4\m 1 

P{n at server 1, m at server 2) = | (j) j 

— J_ ( l\ n 1 4\ m 
— 20 U) (5) 


and 


L = 


6 


8-6 



8 

-= 7, 

10-8 


8.4.2 Closed Systems 

The queueing systems described in Section 8.4.1 are called open systems since cus¬ 
tomers are able to enter and depart the system. A system in which new customers never 
enter and existing ones never depart is called a closed system. 
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Let us suppose that we have m customers moving among a system of k servers, 
where the service times at server i are exponential with rate Hi,i — \,... ,k. When 
a customer completes service at server i, she then joins the queue in front of server 
j, j = 1 ,..., k, with probability I)j , where we now suppose that Pij ~ 1 f° r all 
i = 1,..., k. That is, P = [Pij] is a Markov transition probability matrix, which we 
shall assume is irreducible. Let tt = ( n \, ..., jr*) denote the stationary probabilities 
for this Markov chain; that is, it is the unique positive solution of 

k 

31 j = J2 niPi j’ 

i= 1 

k 

J>; = 1 (8.19) 

j = 1 

If we denote the average arrival rate (or equivalently the average service completion 
rate) at server j by X m (j), j = 1,..., k then, analogous to Equation (8.17), the X m (j) 
satisfy 


k 

S-m (j) — 'y ' k m (0 Pi j 

i=l 

Hence, from (8.19) we can conclude that 

KnU) = j = L 2 ,..., k (8.20) 

where 

k 

k m = ^ m (j) (8.21) 

;=i 

From Equation (8.21), we see that X m is the average service completion rate of the 
entire system, that is, it is the system throughput rate.* 

If we let P m {n\,ri 2 ,... ,nk) denote the limiting probabilities 


P m (n i, ri 2 , ■ ■ ■, rik) — P[n j customers at server j, j = 1, ..., k } 


then, by verifying that they satisfy the balance equation, it can be shown that 


Pm{n\,n 2 , ... ,n k ) 


K m Y\)=xMj)/Pj) ni , if T!j=i nj = m 
0 , otherwise 


* We are just using the notation X m ( j) and X m to indicate the dependence on the number of customers in 
the closed system. This will be used in recursive relations we will develop. 
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But from Equation (8.20) we thus obtain 


P m (ni,n 2 , n k ) 


Cn,n k j=i(Xj/Hj) nj ’ 

0 , 


if L*=l n j = m 
otherwise 


( 8 . 22 ) 


where 


C„ 


-i 


YU*j/*jr' 
" 1,;=1 
E n j= m 


(8.23) 


Equation (8.22) is not as useful as we might suppose, for in order to utilize 
it we must determine the normalizing constant C m given by Equation (8.23), 
which requires summing the products n^ =] (njj over all the feasible vectors 

(hi, ..., n k )\ Y ^)=l n ] = m - Hence, since there are ("' ~ ') vectors this is only 

computationally feasible for relatively small values of m and k. 

We will now present an approach that will enable us to determine recursively many 
of the quantities of interest in this model without first computing the normalizing 
constants. To begin, consider a customer who has just left server i and is headed to 
server j, and let us determine the probability of the system as seen by this customer. In 
particular, let us determine the probability that this customer observes, at that moment, 
ni customers at server 1,1 — Yj =l n l = m — 1. This is done as follows: 


/ J {custonier observes «/ at server 1,1— 1 ..... ^ customer goes from i to j) 
Pjstate is (raj,...,«/ + 1 ,..., nj, ..., n k ), customer goes from i to j) 
P{ customer goes from/' to j j 
Pm(n l, ■■■,«/ + 1, ..., n h ..., n k )niPjj 

1 P mi>n, ...,«/ + 1 , ... , n k )HiPij 


K~ 


from ( 8 . 22 ) 


k 

=c n * rr / //*./ >” ’ 


j =i 


where C does not depend on n \,..., n k - But because the preceding is a probability 
density on the set of vectors («i,..., n k ), n j = tn — 1 , it follows from ( 8 . 22 ) 

that it must equal P m -\{n \,..., n k ). Hence, 


Pjcustomer observes «/ at server Z, Z = 1, ..., k \ customer goes from Z to j) 

k 

= Pm-i(nu ...,n k ), m = m - 1 (8.24) 

i=l 

As (8.24) is true for all i, we thus have proven the following proposition, known as the 
arrival theorem. 
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Proposition 8.3 (The Arrival Theorem) In the closed network system with m cus¬ 
tomers, the system as seen by arrivals to server j is distributed as the stationary distri¬ 
bution in the same network system when there are only m — 1 customers. 

Denote by L m (j ) and W m (j ) the average number of customers and the average 
time a customer spends at server j when there are m customers in the network. Upon 
conditioning on the number of customers found at server j by an arrival to that server, 
it follows that 


W m U) = 


1 + E m [number at server j as seen by an arrival] 


M j 


1 ~t~ L m —\ ( j ) 




(8.25) 


where the last equality follows from the arrival theorem. Now when there are m — 1 
customers in the system, then, from Equation (8.20), A m _i (j), the average arrival rate 
to server j, satisfies 


1 O') — A//;_ ] 71 j 

Now, applying the basic cost identity Equation (8.1) with the cost rule being that each 
customer in the network system of m — 1 customers pays one per unit time while at 
server j, we obtain 


Lm— 1 ( 7 ) — \tc j W m ~i (j) (8.26) 

Using Equation (8.25), this yields 

1 + X m -\TZj Wm- 1 ( j) 

W m (j) = ; J (8.27) 

j 

Also using the fact that L m -l(j) = m — 1 (why?) we obtain, from Equation 

(8.26), the following: 

k 

m - 1 = X m -\ YnjWm-rU) 

7=1 


1 — __, 7 , 

J2i= 1 XiWm-l(i) 

Hence, from Equation (8.27), we obtain the recursion 


J_ (ot - 1) 7T j W m _ 1 (y) 
^j Hj Yli=\ KiW m -i(i) 


(8.28) 


(8.29) 


Starting with the stationary probabilities jtj , j — 1,..., k, and W\{j) = 1 /j.ij we can 
now use Equation (8.29) to determine recursively W 2 U), Wj,(j), ..., W m (j). We can 
then determine the throughput rate by using Equation (8.28), and this will determine 
L m {j) by Equation (8.26). This recursive approach is called mean value analysis. 
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Example 8.9 Consider a ^-server network in which the customers move in a cyclic 
permutation. That is, 

Pi,i +t = l, i = 1,2...,k- 1, Pk, 1 = 1 

Let us determine the average number of customers at server j when there are two 
customers in the system. Now, for this network, 

Hi = l/k, i = 1 ,,k 


and as 


W l (j) = — 

Pj 

we obtain from Equation (8.29) that 


1 


W 2 (j) = — 


a/m/nj) 


Pi Pi EE 1 0/*)d /Pi) 

1 1 


Pi P~j Ei = 1 l /Pi 

Hence, from Equation (8.28), 

2 

\2 = 


2k 


k k / . 

E ^(i) E - 


and finally, using Equation (8.26), 


L2 (j) = X 2T W 2 (j) = 
k 


2 ( — 
\ Pj 


Pf E/=l , 


E/=i 


Ef 2 : 




Another approach to learning about the stationary probabilities specified by Equation 
(8.22), which finesses the computational difficulties of computing the constant C m , 
is to use the Gibbs sampler of Section 4.9 to generate a Markov chain having these 
stationary probabilities. To begin, note that since there are always a total of m customers 
in the system, Equation (8.22) may equivalently be written as a joint mass function of 
the numbers of customers at each of the servers 1 ,,k — 1, as follows: 

k-1 

P m (nu-.-,n k -i) = C m (jT k /ii k ) m ^ n J ]~~[ ( Ttj/Hj) ni 

j =l 

k-1 k— 1 

= K ]~[ (aj) n J , E n i ^ m 

i =1 i =1 
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where aj = {itjP<k)/(XkPj), j = 1,..., k — 1. Now, if N = (N\, ..., N^-i) has the 
preceding joint mass function then 


P{Ni =n\N\ = «i, ..., Nj -1 = 1, N i+ i = n i+ 1, ..., N k -i = n k - 1} 

Pmin i,.. ■, n, «i + i, ..., nt-i) 

J2r p m(n\, ■ • -, r, n i+ 1 -- n*_i) 

= Co", n ^ w — rij 
j^i 


It follows from the preceding that we may use the Gibbs sampler to generate the 
values of a Markov chain whose limiting probability mass function is P m {n\,..., n k -\) 
as follows: 

1. Let {n\, ..., n k -i) be arbitrary nonnegative integers satisfying Yl)=\ n j ^ m - 

2. Generate a random variable I that is equally likely to be any of 1 — 1. 

3. If I = i, set s = m — YLj±i n j> an( J generate the value of a random variable X 
having probability mass function 

P{X = n] = Ca", n = 0, ..., s 

4. Let n / — X and go to step 2. 

The successive values of the state vector {n\,..., n k -\, m — Yl)=\ n j) constitute the 
sequence of states of a Markov chain with the limiting distribution P m . All quantities 
of interest can be estimated from this sequence. For instance, the average of the values 
of the j th coordinate of these vectors will converge to the mean number of individuals 
at station j, the proportion of vectors whose /'th coordinate is less than r will converge 
to the limiting probability that the number of individuals at station j is less than r, and 
so on. 

Other quantities of interest can also be obtained from the simulation. For instance, 
suppose we want to estimate Wj, the average amount of time a customer spends at 
server j on each visit. Then, as noted in the preceding, Lj, the average number of 
customers at server j, can be estimated. To estimate Wj, we use the identity 

Lj = Lj Wj 

where X ,• is the rate at which customers arrive at server j. Setting Xj equal to the service 
completion rate at server j shows that 

Lj = P{j is busy}/re¬ 
using the Gibbs sampler simulation to estimate P\ j is busy} then leads to an estimator 
of Wj. 
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8.5 The System M/G/l 

8.5.1 Preliminaries: Work and Another Cost Identity 

For an arbitrary queueing system, let us define the work in the system at any time t to 
be the sum of the remaining service times of all customers in the system at time t. For 
instance, suppose there are three customers in the system—the one in service having 
been there for three of his required five units of service time, and both people in queue 
having service times of six units. Then the work at that time is 2 + 6 + 6 = 14. Let V 
denote the (time) average work in the system. 

Now recall the fundamental cost equation (8.1), which states that the 

average rate at which the system earns 
= X a x average amount a customer pays 

and consider the following cost rule: Each customer pays at a rate ofy/unit time when 
his remaining service time is y, whether he is in queue or in service. Thus, the rate at 
which the system earns is just the work in the system; so the basic identity yields 


V = k a /-"[amount paid by a customer] 


Now, let S and Wq denote respectively the service time and the time a given customer 
spends waiting in queue. Then, since the customer pays at a constant rate of S per unit 
time while he waits in queue and at a rate of S — x after spending an amount of time x 
in service, we have 


£ [amount paid by a customer] = E 


SW%+ J (S — x)dx 


and thus 

V = X a E[SW* Q ]+ KE ^ S ~ ] (8.30) 

It should be noted that the preceding is a basic queueing identity (like 
Equations (8.2)-(8.4)) and as such is valid in almost all models. In addition, if a cus¬ 
tomer’s service time is independent of his wait in queue (as is usually, but not always 
the case),’then we have from Equation (8.30) that 


X a £[S 2 ] 

y = ke[S]w q + 2 


(8.31) 


8.5.2 Application of Work to M/G/l 

The M/G/l model assumes (i) Poisson arrivals at rate X; (ii) a general service distri¬ 
bution; and (iii) a single server. In addition, we will suppose that customers are served 
in the order of their arrival. 


For an example where it is not true, see Section 8.6.2. 
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Now, for an arbitrary customer in an M/G/l system, 

customer’s wait in queue = work in the system when he arrives (8.32) 

This follows since there is only a single server (think about it!). Taking expectations of 
both sides of Equation (8.32) yields 

Wq — average work as seen by an arrival 

But, due to Poisson arrivals, the average work as seen by an arrival will equal V, the 
time average work in the system. Hence, for the model M/G/l, 


W Q = V 


The preceding in conjunction with the identity 
XE[S 2 ] 

V = XE[S]Wq + 

yields the so-called Pollaczek-Khintchine formula. 


Wq 


k£[S 2 ] 

2(1 -XE[S]) 


(8.33) 


where E^S] and /T[.S’ 2 ] are the first two moments of the service distribution. 
The quantities L,Lq, and W can be obtained from Equation (8.33) as 


Lq = XW Q 
W = Wq + E[S] = 


X 2 E[S 2 ] 


2(1 -XE[S]) 

XE[S 2 ] 


L=XW = 


2(1 -XE[S]) 
X 2 E[S 2 ] 

•XE[S] 


E[S], 


2(1 -X£[5]) 


(8.34) 


Remarks 

(i) For the preceding quantities to be finite, we need kE[,S] < 1. This condition is 
intuitive since we know from renewal theory that if the server was always busy, 
then the departure rate would be 1 /£[£] (see Section 7.3), which must be larger 
than the arrival rate X to keep things finite. 

(ii) Since £[S 2 ] = Var(S) + (£[S]) 2 , we see from Equations (8.33) and (8.34) that, 
for fixed mean service time, L, Lq,W , and Wq all increase as the variance of the 
service distribution increases. 

(iii) Another approach to obtain Wq is presented in Exercise 38. 
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8.5.3 Busy Periods 

The system alternates between idle periods (when there are no customers in the system, 
and so the server is idle) and busy periods (when there is at least one customer in the 
system, and so the server is busy). 

Let I and B represent, respectively, the length of an idle and of a busy period. 
Because I represents the time from when a customer departs and leaves the system 
empty until the next arrival, it follows, since arrivals are according to a Poisson process 
with rate k, that I is exponential with rate k and thus 

E[I] = y (8.35) 


To determine E[B] we argue, as in Section 8.3.3, that the long-run proportion of 
time the system is empty is equal to the ratio of E[I] to /-’[/] + E[B], That is, 


Po = 


E[I] 

E[I] + E[B] 


(8.36) 


To compute Pq, we note from Equation (8.4) (obtained from the fundamental cost 
equation by supposing that a customer pays at a rate of one per unit time while in 
service) that 


average number of busy servers = ).E[S] 

However, as the left-hand side of the preceding equals 1 — Pq (why?), we have 

P 0 = 1 - L£[S] (8.37) 


and, from Equations (8.35)-(8.37), 


1 -L£[S] = 


1A 

1 /k + E[B] 


or 


E[B] = 


E[S] 

1 - LETS] 


Another quantity of interest is C, the number of customers served in a busy period. 
The mean of C can be computed by noting that, on the average, for every E[C] arrivals 
exactly one will find the system empty (namely, the first customer in the busy period). 
Hence, 


flo 


1 

E[C] 


and, as ciq = Pq = 1 — 7. E [ .S'] because of Poisson arrivals, we see that 
1 

1 - LETS’] 


E[C} = 
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8.6 Variations on the M/G/l 


8.6.1 The M/G/l with Random-Sized Batch Arrivals 


Suppose that, as in the M/G/l, arrivals occur in accordance with a Poisson process 
having rate A.. But now suppose that each arrival consists not of a single customer but of 
a random number of customers. As before there is a single server whose service times 
have distribution G. 

Let us denote by a ;, j ^ 1, the probability that an arbitrary batch consists of j 
customers; and let N denote a random variable representing the size of a batch and so 
P{N = j} = ctj. Since X a = XE(N), the basic formula for work (Equation (8.31)) 
becomes 


V = XE[N] 


E(S)Wq + 


E(S 2 y 

2 


(8.38) 


To obtain a second equation relating V to Wq, consider an average customer. We have 
that 


his wait in queue = work in system when he arrives 

+ his waiting time due to those in his batch 
Taking expectations and using the fact that Poisson arrivals see time averages yields 
Wq — V + £ [waiting time due to those in his batch] 

= V + E[W b ] (8.39) 

Now, E(Wb) can be computed by conditioning on the number in the batch, but we 
must be careful because the probability that our average customer comes from a batch 
of size j is not a j. For aj is the proportion of batches that are of size j , and if we pick a 
customer at random, it is more likely that he comes from a larger rather than a smaller 
batch. (For instance, suppose oq = oqoo = then half the batches are of size 1 but 
100/101 of the customers will come from a batch of size 100!) 

To determine the probability that our average customer came from a batch of size 
j we reason as follows: Let M be a large number. Then of the first M batches approx¬ 
imately Maj will be of size j , / Js 1, and thus there would have been approximately 
j Ma j customers that arrived in a batch of size j. Hence, the proportion of arrivals in 
the first M batches that were from batches of size j is approximately j Ma j/Y/ ■ j Maj . 
This proportion becomes exact as M oo, and so we see that 

ja; 

proportion of customers from batches of size j = —— ; — 

Ej jaj 

= J a j 

E[N] 

We are now ready to compute E(Wb), the expected wait in queue due to others in the 
batch: 

E[Wb] = Y] E[W b I batch of size j]YY- 
z —' £[Af] 

J 


(8.40) 
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Now if there are j customers in his batch, then our customer would have to wait for 
z — 1 of them to be served if he was z th in line among his batch members. As he is 
equally likely to be either 1st, 2nd, ..., or yth in line we see that 


E[Wb I batch is of size j] = 


1 

V(i-1 )E(S)~ 
!= 1 J 


j ~ 1 

= J —E[S] 


Substituting this in Equation (8.40) yields 


E[W B ] 


2E[N] ^ J J J 
j 

£[S](£[iV 2 ] - £[JV]) 
2E[N] 


and from Equations (8.38) and (8.39) we obtain 


Wq 


£[5](£[iV 2 ] - E[N])/2E[N] + XE[N]E[S 2 ]/2 
1 -k£[A]£[5] 


Remarks 


(i) Note that the condition for Wq to be finite is that 


XE{N) < 


1 

E[S] 


which again says that the arrival rate must be less than the service rate (when the 
server is busy). 

(ii) For fixed value of £[,/V], Wq is increasing in Var(A), again indicating that “single¬ 
server queues do not like variation.” 

(iii) The other quantities L, Lq, and W can be obtained by using 


W — Wq + E[S], 

L = X a W = XE[N]W, 
Lq=XE[N]W q 


8.6.2 Priority Queues 

Priority queueing systems are ones in which customers are classified into types and 
then given service priority according to their type. Consider the situation where there 
are two types of customers, which arrive according to independent Poisson processes 
with respective rates /, i and Xj, and have service distributions G i and G 2 - We suppose 
that type 1 customers are given service priority, in that service will never begin on a 
type 2 customer if a type 1 is waiting. However, if a type 2 is being served and a type 1 
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arrives, we assume that the service of the type 2 is continued until completion. That is, 
there is no preemption once service has begun. 

Let W'q denote the average wait in queue of a type i customer, i = 1,2. Our objective 
is to compute the W'q. 

First, note that the total work in the system at any time would be exactly the same no 
matter what priority rule was employed (as long as the server is always busy whenever 
there are customers in the system). This is so since the work will always decrease at a 
rate of one per unit time when the server is busy (no matter who is in service) and will 
always jump by the service time of an arrival. Hence, the work in the system is exactly 
as it would be if there was no priority rule but rather a first-come, first-served (called 
FIFO) ordering. However, under FIFO the preceding model is just M/G/l with 


k — k] + A.2, 

G(x)= t±Gi(x)+^G 2 (x) (8.41) 

A A 

which follows since the combination of two independent Poisson processes is itself a 
Poisson process whose rate is the sum of the rates of the component processes. The 
service distribution G can be obtained by conditioning on which priority class the 
arrival is from—as is done in Equation (8.41). 

Hence, from the results of Section 8.5, it follows that V, the average work in the 
priority queueing system, is given by 

k£[S 2 ] 

V “ 2(1 -kE[S]) 

A)£[S?] + (X 2 /X )E[Sl]) 

2[1 - M(MA)£[Si] + (k 2 /k )£[S 2 ])] 
k l E[S}] + k 2 E[SZ] 

= -—- 2 - (8.42) 

2(l-k 1 £[Si]-k 2 £[S 2 ]) 


where 5; has distribution G;, i — 1, 2. 

Continuing in our quest for W'q let us note that S and Wq, the service and wait 
in queue of an arbitrary customer, are not independent in the priority model since 
knowledge about S gives us information as to the type of customer, which in turn gives 
us information about Wq. To get around this we will compute separately the average 
amount of type 1 and type 2 work in the system. Denoting V' as the average amount 
of type i work we have, exactly as in Section 8.5.1, 


= kiElSiWg 


kiE[Sf] 

2 


i = 1,2 


(8.43) 


If we define 


V'gSkiElSiWl}, 
XiE[Sfi 


Vs = 


2 
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then we may interpret Vq as the average amount of type i work in queue, and V’ s as 
the average amount of type i work in service (why?). 

Now we are ready to compute Wq. To do so, consider an arbitrary type 1 arrival. 
Then 

his delay = amount of type 1 work in the system when he arrives 
+amounts of type 2 work in service when he arrives 

Taking expectations and using the fact that Poisson arrivals see time average yields 


Wq = v 1 + p s - 


= X 1 E[S l ]W l Q + - 


X 2 E[S 2 ] 


(8.44) 


or 


t , X l E[S 2 ] + X 2 E[S 2 ] 

WX = - - -— 

2 2(l-X 1 £[Si]) 


(8.45) 


To obtain Wq we first note that since V—V l + V 2 , we have from Equations (8.42) 
and (8.43) that 


li£[5r] + X'iE[Sh , 0 

V ‘ ; ", 2 rn „ = ^ElSiW' + X 2 E[S 2 ]Wq 


2(1-A.i£[S 1 ]-X 2 £[5 2 ]) 


, ^2E[S 2 ] 


2 2 
= W l Q + X 2 E[S 2 ]W 2 

Now, using Equation (8.45), we obtain 

Ai E[S 2 ] + X 2 E[S 2 ] 


X 2 E[S 2 ]Wq 


1 


from Equation(8.44) 


1 


1-XiE[Si]-X 2 E[S 2 ] l-MEISi], 


or 


K = 


XiE[S 2 ] + X 2 E[S 2 ] 


2(1 - XiE[S i] - X 2 E[S 2 ])(l - A.i£[5i]) 


(8.46) 


Remarks 

(i) Note that from Equation (8.45), the condition for Wq to be finite is that Ai£[5i] < 1, 
which is independent of the type 2 parameters. (Is this intuitive?) For W ^ to be 
finite, we need, from Equation (8.46), that 


ki£[Si] + k 2 E[5 2 ] < 1 


Since the arrival rate of all customers is X = ki + X 2 , and the average service time 
of a customer is (A.i/A)Zs[5 , i] + (X 2 /X)E[S 2 ], the preceding condition is just that 
the average arrival rate be less than the average service rate. 
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(ii) If there are n types of customers, we can solve for V 1 , j = 1,..., n, in a similar 
fashion. First, note that the total amount of work in the system of customers of types 
1,..., j is independent of the internal priority rule concerning types 1, ..., j and 
only depends on the fact that each of them is given priority over any customers of 
types j + 1, ..., n. (Why is this? Reason it out!) Hence, V 1 + • — h W is the same 
as it would be if types 1,..., j were considered as a single type I priority class and 
types j + 1,..., n as a single type II priority class. Now, from Equations (8.43) 
and (8.45), 

i _ AiE[S 2 ] + A.iA. n £[5i]£[5g] 

2(1 - k|/f|,V|]) 

where 


Ai — A i + • • • + Xj, 
An = A/+1 + • • • + A„, 

E[Si] = E v E[Si] ’ 

j=i 

E[Sf] = J2 Y E[S ‘ l 
1=1 


£[S : i] = 


E 


i=j +1 


Ah 

All 


£[S, 2 ] 


Hence, as V 1 = V 1 + • • • + V 7 ', we have an expression for V 1 + ■ ■ ■ + V', for 
each j = 1 ,,n, which then can be solved for the individual V 1 , V 2 ,..., V". 
We now can obtain Wg from Equation (8.43). The result of all this (which we leave 
for an exercise) is that 


„ = AiE[^] + --- + A„E[S 2 ] 

2 2n , /=1 _ 1 (l-Ai£(5i]- ^jE[Sj])’ 


i = l,... ,n 

(8.47) 


8.6.3 An M/G/l Optimization Example 

Consider a single-server system where customers arrive according to a Poisson process 
with rate A, and where the service times are independent and have distribution function 
G. Let p = ).G[S], where S represents a service time random variable, and suppose 
that p < 1. Suppose that the server departs whenever a busy period ends and does not 
return until there are n customers waiting. At that time the server returns and continues 
serving until the system is once again empty. If the system facility incurs costs at a 
rate of c per unit time per customer in the system, as well as a cost K each time the 
server returns, what value of n, n + 1, minimizes the long-run average cost per unit 
time incurred by the facility, and what is this minimal cost? 
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To answer the preceding, let us first determine A(n ), the average cost per unit time 
for the policy that returns the server whenever there are n customers waiting. To do 
so, say that a new cycle begins each time the server returns. As it is easy to see that 
everything probabilistically starts over when a cycle begins, it follows from the theory 
of renewal reward processes that if C (n) is the cost incurred in a cycle and T(n) is the 
time of a cycle, then 


A(n) = 


E[C(n )] 
E[T{n)] 


To determine E [ C (n ) ] and E [ T (n ) ], consider the time interval of length, say, 7), starting 
from the first time during a cycle that there are a total of i customers in the system until 
the first time afterward that there are only i — 1. Therefore, Yl'i =i ^is th e amount of 
time that the server is busy during a cycle. Adding the additional mean idle time until 
n customers are in the system gives 


n 

E[T{n)] = Y J E[T i ] -f- n / X 

i'=l 


Now, consider the system at the moment when a service is about to begin and there are 
i — 1 customers waiting in queue. Since service times do not depend on the order in 
which customers are served, suppose that the order of service is last come first served, 
implying that service does not begin on the i — 1 presently in queue until these i — 1 
are the only ones in the system. Thus, we see that the time that it takes to go from 
i customers in the system to i — 1 has the same distribution as the time it takes the 
M/G /1 system to go from a single customer (just beginning service) to empty; that is, 
its distribution is that of B, the length of an M/G /1 busy period. (Essentially the same 
argument was made in Example 5.25.) Hence, 

E[S] 

E[Ti] = E[B] = 

1 - p 


implying that 


E[T(n)] = 


«£[S] 

1 -XE[S] 


n 

X 


n 

Ml -P) 


(8.48) 


To determine E[C(n)], let C, denote the cost incurred during the interval of length 
Tj that starts with i — 1 in queue and a service just beginning and ends when the i — 1 
in queue are the only customers in the system. Thus, K + Y^i=i Q represents the 
total cost incurred during the busy part of the cycle. In addition, during the idle part 
of the cycle there will be i customers in the system for an exponential time with rate 
X, i = l,..., n— 1, resulting in an expected cost of c(l + - ■ - + n— \)/X. Consequently, 

E n(n — l)c 

E[Ci] + ^ 

i =1 


(8.49) 
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To find E\C{\, consider the moment when the interval of length I) begins, and let 
Wi be the sum of the initial service time plus the sum of the times spent in the system 
by all the customers that arrive (and are served) until the moment when the interval 
ends and there are only i — 1 customers in the system. Then, 

Ci = (i - 1 )cTi + cWj 

where the first term refers to the cost incurred due to the i — 1 customers in queue 
during the interval of length 7). As it is easy to see that Wi has the same distribution as 
Wi,, the sum of the times spent in the system by all arrivals in an M/G /1 busy period, 
we obtain 


E[S] 

E[Cii = (i - l)c-^ + cE[W b ] 

1 - p 

Using Equation (8.49), this yields 

n(n — l)c£[S'] n(n — l)c 

E[C(n)] = K+ - 1 ' +ncE[W b ]J 


2(1 -p) 


21 


n(n — l)c / p 

= K + ncE[W b ] + — ( —— + 1 


= K + ncE[W b ] + 


21 V 1 - p 
n(n — l)c 


21(1-p) 

Utilizing the preceding in conjunction with Equation (8.48) shows that 

A"l(l — p) c(n — 1) 

A(n) = --- — + lc(l - p)E[W b ] + „ 


(8.50) 


(8.51) 


To determine E[W b \, we use the result that the average amount of time spent in the 
system by a customer in the M/G /1 system is 

1£[S 2 ] 

w = W Q + E[S] = 1 \ + E[S] 

2(1 - p ) 

However, if we imagine that on day j, j ^ 1, we earn an amount equal to the total 
time spent in the system by the jth arrival at the M/G/ 1 system, then it follows from 
renewal reward processes (since everything probabilistically restarts at the end of a 
busy period) that 

E[W b ] 

W = ——- 
E[N] 

where N is the number of customers served in an M/G /1 busy period. Since EIW] = 
1 /(I — p) we see that 


(1 - p)E[W b ] = W = 


1£[S 2 ] 

2(1 -p) 


+ E[S] 
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Therefore, using Equation (8.51), we obtain 

KX(l - p) cX 2 E[S 2 ] c(ii-I) 

A(n ) = -+ —-- + cp + --- 

n 2(1 — p) 2 

To determine the optimal value of n , treat n as a continuous variable and differentiate 
the preceding to obtain 


A'(n) = 


-KX(l-p) c 


Setting this equal to 0 and solving yields that the optimal value of n is 


2.0(1 - p ) 


and the minimal average cost per unit time is 

- cX 2 E[S 2 ] c 

A(n ) = ^/2XK(\ — p)c + . J +cp~- 

2(1 — p) 2 


It is interesting to see how close we can come to the minimal average cost when 
we use a simpler policy of the following form: Whenever the server finds the system 
empty of customers she departs and then returns after a fixed time t has elapsed. Let 
us say that a new cycle begins each time the server departs. Both the expected costs 
incurred during the idle and the busy parts of a cycle are obtained by conditioning on 
N(t), the number of arrivals in the time t that the server is gone. With C(t) being the 
cost incurred during a cycle, we obtain 


N(t) 

E[C(t) | N(t)i = K + J2 E [Ci] + cN(t)- 

i =1 

N(t)(N(t) - 1 )cE[S] 


= K 


2(1 -p) 


N(t)cE[W b ] + cN(t)~ 


The final term of the first equality is the conditional expected cost during the idle time 
in the cycle and is obtained by using that, given the number of arrivals in the time 
t, the arrival times are independent and uniformly distributed on (0, f); the second 
equality used Equation (8.50). Since N(t) is Poisson with mean ),t, it follows that 
E[N(t)(N(t ) — 1)] = EIW 2 )/)] — £[A^(f)] = X 2 t 2 . Thus, taking the expected value 
of the preceding gives 


E[C(t)] = K + 


X 2 t 2 cE[S] 
2(1 -p) 
cXt 2 


+ XtcE[W b ] + 


cXt 2 

~Y~ 


2(1 -p) 


= K + 


+ XtcE[W b ] 
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Similarly, if T ( t ) is the time of a cycle, then 

E[f(t)i = E[E[f(t)\Nm 
= E[t + N(t)E[B]] 
pt 


= t 


1 - p 


1 - p 


Hence, the average cost per unit time, call it A(t), is 

E[T(t )] 

K(l-p) ckt 

= - - - 1—^^ ~~ P)E[Wb\ 

Thus, from Equation (8.51), we see that 
A(n/X) - A(n) = c/2 


which shows that allowing the return decision to depend on the number presently in 
the system can reduce the average cost only by the amount c/2. ■ 


8.6.4 The M/G/l Queue with Server Breakdown 

Consider a single server queue in which customers arrive according to a Poisson process 
with rate X, and where the amount of service time required by each customer has 
distribution G. Suppose, however, that when working the server breaks down at an 
exponential rate a. That is, the probability a working server will be able to work for 
an additional time t without breaking down is e~ at . When the server breaks down, 
it immediately goes to the repair facility. The repair time is a random variable with 
distribution H. Suppose that the customer in service when a breakdown occurs has its 
service continue, when the sever returns, from the point it was at when the breakdown 
occurred. (Therefore, the total amount of time a customer is actually receiving service 
from a working server has distribution G.) 

By letting a customer’s “service time” include the time that the customer is waiting 
for the server to come back from being repaired, the preceding is an M/G/l queue. 
If we let T denote the amount of time from when a customer first enters service until 
it departs the system, then T is a service time random variable of this M/G/l queue. 
The average amount of time a customer spends waiting in queue before its service first 
commences is, thus, 

XE[T 2 ] 

Q 2(1 -XE[T]) 

To compute E[T] and E[T 2 ], let S, having distribution G, be the service requirement 
of the customer; let N denote the number of times that the server breaks down while 
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the customer is in service; let R [, R 2 , ... be the amounts of time the server spends in 
the repair facility on its successive visits. Then, 

N 

T = J2 R i + s 

i =1 

Conditioning on S yields 


£[T|S = .y] = E 


N 


j2 R i\s = . 


. 1=1 


+ 5 , 


Var(T\S = s) = Var ^ Ri\S = s^J 


Now, a working server always breaks down at an exponential rate a. Therefore, given 
that a customer requires s units of service time, it follows that the number of server 
breakdowns while that customer is being served is a Poisson random variable with 
mean as. Consequently, conditional on S = .v, the random variable Ylf'Li R i is a 
compound Poisson random variable with Poisson mean as. Using the results from 
Examples 3.10 and 3.17, we thus obtain 


N 


j2 R i\s = 


.1=1 


sE[R], Vai^^R^S = s\ =asE[R 2 ] 


where R has the repair distribution El. Therefore, 


£[T|5] = + S = 5(1 + «£[/?]), 

Var(T|5) = aS£[/? 2 ] 


Thus, 


E[T] = E[E[r|S]] = E[S](l+aE[R]) 

and, by the conditional variance formula, 

Var(T) = £[Var(T|5)] + Var(£[T|5]) 

= aEtSlETfl 2 ] + (1 + aE[R]) 2 Va.r(S) 


Therefore, 

E[T 2 ] = Var (T) + (E[T]) 2 

= a£[5]£[/? 2 ] + (1 + a£[/?]) 2 £[5 2 ] 

Consequently, assuming that EE[T] = ).E[S]( 1 + aE[ A 1 ]) < 1, we obtain 

ko!£[5]£[/? 2 ] + k(l + a£[/?]) 2 £[5 2 ] 

Wq = 


2(1 -k£[5](l +«£[/?])) 
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From the preceding, we can now obtain 

Lq=XWq, 

W = Wq + E[T], 

L=XW 

Some other quantities we might be interested in are 

(i) P w , the proportion of time the server is working; 

(ii) P r , the proportion of time the server is being repaired; 

(iii) Pi, the proportion of time the server is idle. 

These quantities can all be obtained by using the queueing cost identity. For instance, 
if we suppose that customers pay 1 per unit time while actually being served, then 

average rate at which system earns = P w , 
average amount a customer pays = £[S] 

Therefore, the identity yields 


Pw = A.E[S] 


To determine P r , suppose a customer whose service is interrupted pays 1 per unit time 
while the server is being repaired. Then, 

average rate at which system earns = P r , 


average amount a customer pays 


N 


L/=l J 


= a£[S]£[fl] 


yielding 

P r = Aa£[S]£'[/?] 

Of course, P/ can be obtained from 

Pi = 1 - Pw ~ P r 

Remark The quantities P w and P, could also have been obtained by first noting that 
1 — Po = XE[T] is the proportion of time the server is either working or in repair. 
Thus, 

E[S] 

P w = XE[T]-^- =XE[S], 

E[T] 

E[T] - £[S] 

P r = XE[T]—— -^■ = A£[S]oi£[tf] ■ 
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8.7 The Model G/M/l 


The model G/M/l assumes that the times between successive arrivals have an arbitrary 
distribution G. The service times are exponentially distributed with rate /i and there is 
a single server. 

The immediate difficulty in analyzing this model stems from the fact that the number 
of customers in the system is not informative enough to serve as a state space. For in 
summarizing what has occurred up to the present we would need to know not only the 
number in the system, but also the amount of time that has elapsed since the last arrival 
(since G is not memoryless). (Why need we not be concerned with the amount of time 
the person being served has already spent in service?) To get around this problem we 
shall only look at the system when a customer arrives; and so let us define X n ,n Js 1, 
by 

X n = the number in the system as seen by the nth arrival 

It is easy to see that the process {X n , n ^ 1} is a Markov chain. To compute the 
transition probabilities Pij for this Markov chain let us first note that, as long as there 
are customers to be served, the number of services in any length of time t is a Poisson 
random variable with mean jit. This is true since the time between successive services 
is exponential and, as we know, this implies that the number of services thus constitutes 
a Poisson process. Hence, 




which follows since if an arrival finds i in the system, then the next arrival will find 
i + 1 minus the number served, and the probability that j will be served is easily 
seen to equal the right side of the preceding (by conditioning on the time between the 
successive arrivals). 

The formula for P,o is a little different (it is the probability that at least i + 1 Poisson 
events occur in a random length of time having distribution G) and can be obtained 
from 

i 

PiO = 1 — ^2 /J M + W 
.7=0 

The limiting probabilities jr*, k = 0, 1,..., can be obtained as the unique solution of 

ttk = ^jttiPik, k > 0, 


= 1 
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which, in this case, reduce to 


rtk — / v I 

i=*-l J ° 

OO 

= i 


00 -^>0 


i+l—k 


(/ + 1 — k)\ 


dG(t), k > 1, 


(8.52) 


(We have not included the equation ttq = ^ 7T, P,o since one of the equations is always 
redundant.) 

To solve the preceding, let us try a solution of the form tx\ = cfi k . Substitution into 
Equation (8.52) leads to 


- /»C 

cfi k =c F I 
i=k -1 Jo 

(* OO 




pk-l 


e ~nt - dG{t) 

(i + 1 - Jt)! 

00 /n 


v-' 

4-. a +1 


i=k — 1 


(i + 1 - *)! 


dG(0 


However, 

= e ^ t 


(8.53) 


and thus Equation (8.53) reduces to 

n OO 

= p k ~ l / 

Jo 

or 

poo 

P = / e“ Mr(1 “^r/G(f) (8.54) 

Jo 

The constant c can be obtained from — I, which implies that 

OO 

^ = 1 
o 

or 


C= 1 — yS 

As (7T*) is the unique solution to Equation (8.52), and iCk = (1 — P)P k satisfies, it 
follows that 

Jtk = 0~P)P k , k = 0 , 1 ,... 
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where p is the solution of Equation (8.54). (It can be shown that if the mean of G is 
greater than the mean service time 1 //z, then there is a unique value of p satisfying 
Equation (8.54) which is between 0 and 1.) The exact value of /I usually can only be 
obtained by numerical methods. 

As jtk is the limiting probability that an arrival sees k customers, it is just the at as 
defined in Section 8.2. Hence, 

a k = (1 - 0)0*, 0 (8.55) 


We can obtain W by conditioning on the number in the system when a customer arrives. 
This yields 


W = £[time in system | arrival sees A:](1 — P)P k 


x—"\ k -\- 1 i. 

= y — o -/*)/» 

V 11 

i 


M(1 - yS) 


(Since if an arrival sees k then he spends 
k + 1 service periods in the system) 

( 00 \ 
by using kx k = 
o 


(1 - x) 1 ; 


and 


1 


Wn = W - 

r 

l = xw = 
l q = = 


d M(1 - P) 

X 


Md -py 

xp 


At(l -P) 

where X is the reciprocal of the mean interarrival time. That is, 


1 _ r 

-Jo 


x dG(x) 


(8.56) 


In fact, in exactly the same manner as shown for the M/M/1 in Section 8.3.1 and 
Exercise 4 we can show that 

W* is exponential with rate /z(l — P), 

1 0 with probability 1 — p 

exponential with rate /z(l — P) with probability p 

where W* and Wq are the amounts of time that a customer spends in system and queue, 
respectively (their means are W and Wq). 

Whereas a k = (1 — P)P k is the probability that an arrival sees k in the system, it 
is not equal to the proportion of time during which there are k in the system (since the 
arrival process is not Poisson). To obtain the P k we first note that the rate at which the 
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number in the system changes from k — 1 to A: must equal the rate at which it changes 
from k to k — 1 (why?). Now the rate at which it changes from k — 1 to k is equal to 
the arrival rate X multiplied by the proportion of arrivals finding k — 1 in the system. 
That is, 

rate number in system goes from k — 1 to k = Xai : _i 

Similarly, the rate at which the number in the system changes from k to k — 1 is equal 
to the proportion of time during which there are k in the system multiplied by the 
(constant) service rate. That is, 

rate number in system goes from k to k — 1 = Pk ji 

Equating these rates yields 

X 

Pk — - Clk— 1, k X? 1 

/x 

and so, from Equation (8.55), 

P k = -(l-P)P k ~ 1 , kX> 1 
M 

and, as Pq = I — Y1T= l At> we obtain 

A) = 1 - - 

Remarks In the foregoing analysis we guessed at a solution of the stationary prob¬ 
abilities of the Markov chain of the form 777 . = c/3*, then verified such a solution 
by substituting in the stationary Equation (8.52). However, it could have been argued 
directly that the stationary probabilities of the Markov chain are of this form. To do so, 
define ft, to be the expected number of times that state i + 1 is visited in the Markov 
chain between two successive visits to state i, i ^ 0. Now it is not difficult to see (and 
we will let you argue it out for yourself) that 

A) = Pi = Pi = • ■ ■ = P 

Now it can be shown by using renewal reward processes that 
E [number of visits to state 7 + 1 in an i—i cycle] 

7V i~\~ 1 — 

£ [number of transitions in an i-i cycle] 

A 

Inl¬ 
and so, 

7T; + l = PjTti = fijti, 7^0 

implying, since jt; = 1, that 


JTi = A(l-/0), 


7 ^ 0 
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8.7.1 The G/M/l Busy and Idle Periods 

Suppose that an arrival has just found the system empty—and so initiates a busy 
period—and let N denote the number of customers served in that busy period. Since 
the IVth arrival (after the initiator of the busy period) will also find the system empty, 
it follows that N is the number of transitions for the Markov chain (of Section 8.7) to 
go from state 0 to state 0. Hence, \/E[N] is the proportion of transitions that take the 
Markov chain into state 0; or equivalently, it is the proportion of arrivals that find the 
system empty. Therefore, 


E[N] = — 

a 0 


1 


Also, as the next busy period begins after the IVth interarrival, it follows that the cycle 
time (that is, the sum of a busy and idle period) is equal to the time until the Ath 
interarrival. In other words, the sum of a busy and idle period can be expressed as the 
sum of A interarrival times. Thus, if 7) is the ;th interarrival time after the busy period 
begins, then 


£[Busy] + Zsfldle] = E 


L /=1 J 


= E[N]E[T] 

1 

~~ M1-/3) 


(by Wald’s equation) 


(8.57) 


For a second relation between E\ Busy] and Zs[Idle], we can use the same argument as 
in Section 8.5.3 to conclude that 


£[Busy] 

E[Idle] + £[Busy] 


and since Pq = 1 — X/fi, we obtain, upon combining this with (8.57), that 


£[Busy] 

£[Idle] 


1 

nix-py 

U — X 
k/x(l — P) 


8.8 A Finite Source Model 

Consider a system of m machines, whose working times are independent exponential 
random variables with rate X. Upon failure, a machine instantly goes to a repair facility 
that consists of a single repairperson. If the repairperson is free, repair begins on the 
machine; otherwise, the machine joins the queue of failed machines. When a machine 
is repaired it becomes a working machine, and repair begins on a new machine from 
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the queue of failed machines (provided the queue is nonempty). The successive repair 
times are independent random variables having density function g, with mean 



To analyze this system, so as to determine such quantities as the average number of 
machines that are down and the average time that a machine is down, we will exploit 
the exponentially distributed working times to obtain a Markov chain. Specifically, 
let X n denote the number of failed machines immediately after the nth repair occurs, 
n ^ 1. Now, if X n = i > 0, then the situation when the nth repair has just occurred 
is that repair is about to begin on a machine, there are i — 1 other machines waiting 
for repair, and there are m — i working machines, each of which will (independently) 
continue to work for an exponential time with rate X. Similarly, if X n = 0, then all 
m machines are working and will (independently) continue to do so for exponentially 
distributed times with rate X. Consequently, any information about earlier states of the 
system will not affect the probability distribution of the number of down machines at 
the moment of the next repair completion; hence, {X n ,n Js 1} is a Markov chain. To 
determine its transition probabilities Pi.j, suppose first that i > 0. Conditioning on R. 
the length of the next repair time, and making use of the independence of the m — i 
remaining working times, yields that for j C m — i 

Pij-i+j = P{j failures during R\ 

OO 

P{j failures during R \ R — r}g(r)dr 

= jH (“ " (1 - e-^He-^r-^gWdr 

If i = 0, then, because the next repair will not begin until one of the machines fails, 

l\),j = P\J, j<m- 1 

Let 7 xj, j = 0, ..., m — 1, denote the stationary probabilities of this Markov chain. 
That is, they are the unique solution of 

71 j = 

i 

m— 1 

E 71 i = 1 

j =o 

Therefore, after explicitly determining the transition probabilities and solving the pre¬ 
ceding equations, we would know the value of jtq, the proportion of repair completions 
that leaves all machines working. Let us say that the system is “on” when all machines 
are working and “off” otherwise. (Thus, the system is on when the repairperson is idle 
and off when he is busy.) As all machines are working when the system goes back 
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on, it follows from the lack of memory property of the exponential that the system 
probabilistically starts over when it goes on. Hence, this on-off system is an alternat¬ 
ing renewal process. Suppose that the system has just become on, thus starting a new 
cycle, and let Rj, i ^ 1, be the time of the /th repair from that moment. Also, let N 
denote the number of repairs in the off (busy) time of the cycle. Then, it follows that 
B, the length of the off period, can be expressed as 

N 

*=x> 

1=1 

Although N is not independent of the sequence R\ , f? 2 , ■ ■ •, it is easy to check that it 
is a stopping time for this sequence, and thus by Wald’s equation (see Exercise 13 of 
Chapter 7) we have 

E[B] = E[N]E[R] = E[N]br 

Also, since an on time will last until one of the machines fails, and since the minimum 
of independent exponential random variables is exponential with a rate equal to the 
sum of their rates, it follows that E[I], the mean on (idle) time in a cycle, is given by 

E[I] = l/(mk) 

Hence, P R , the proportion of time that the repairperson is busy, satisfies 

p E[N]fi R 

B E[N](ir + 1 /(mX) 

However, since, on average, one out of every Zs[A^] repair completions will leave all 
machines working, it follows that 

770 ~~ E[N] 

Consequently, 


Pb 


BR 

BR + Tto/(mX) 


(8.58) 


Now focus attention on one of the machines, call it machine number 1, and let P\ R 
denote the proportion of time that machine 1 is being repaired. Since the proportion of 
time that the repairperson is busy is P R , and since all machines fail at the same rate 
and have the same repair distribution, it follows that 


Pl,R 


Pb_ 

m 


BR 

m-BR + 7 Iq/X 


(8.59) 


However, machine 1 alternates between time periods when it is working, when it is 
waiting in queue, and when it is in repair. Let Wi, Q, . .S', denote, respectively, the / th 
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working time, the ith queueing time, and the /th repair time of machine 1, i Js 1. Then, 
the proportion of time that machine 1 is being repaired during its first n working- 
queue-repair cycles is as follows: 

proportion of time in the first n cycles that machine 1 is being repaired 

EUSi 

E"=t w ‘ + E"=t Qi + EUSi 

= _ EU Si/n _ 

EU Wi/* + EU Qi/n + EU Si/n 

Letting n —> oo and using the strong law of large numbers to conclude that the averages 
of the Wj and of the S) converge, respectively, to 1 //, and /ir, yields 

D _ VR 

r\ r = -=- 

i A + e + MS 

where Q is the average amount of time that machine 1 spends in queue when it fails. 
Using Equation (8.59), the preceding gives 

flR _ flR 

muR + no A 1A +Q + HR 

or, equivalently, that 

Q = (m- 1 )fiR - (1 - 7T 0 )A 

Moreover, since all machines are probabilistically equivalent it follows that Q is equal 
to Wq, the average amount of time that a failed machine spends in queue. To determine 
the average number of machines in queue, we will make use of the basic queueing 
identity 

Lq = X 0 Wq = A <2 

where X a is the average rate at which machines fail. To determine k a , again focus 
attention on machine 1 and suppose that we earn one per unit time whenever machine 
1 is being repaired. It then follows from the basic cost identity of Equation (8.1) that 

Pi,R — rifiR 

where r\ is the average rate at which machine 1 fails. Thus, from Equation (8.59), we 
obtain 

1 

r\ = - 

m[iR + 7ToA 

Because all m machines fail at the same rate, the preceding implies that 
m 

X a = mr i = - 

mfiR + 7ToA 
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which gives that the average number of machines in queue is 


m(m — 1 )fiR — m(l — ttq)/X 

Lq = - 

mUR + 7to/X 

Since the average number of machines being repaired is Pg , the preceding, along with 
Equation (8.58), shows that the average number of down machines is 


L — Lq + Pb = 


m 2 iiR — m(] — ttq)/X 
mfiR + tto/X 


8.9 Multiserver Queues 


By and large, systems that have more than one server are much more difficult to analyze 
than those with a single server. In Section 8.9.1 we start first with a Poisson arrival 
system in which no queue is allowed, and then consider in Section 8.9.2 the infinite 
capacity M/M/k system. For both of these models we are able to present the limiting 
probabilities. In Section 8.9.3 we consider the model G/M/k. The analysis here is 
similar to that of the G/M /1 (Section 8.7) except that in place of a single quantity 
ft given as the solution of an integral equation, we have k such quantities. We end in 
Section 8.9.4 with the model M/G/k for which unfortunately our previous technique 
(used in M/G/ 1) no longer enables us to derive Wq, and we content ourselves with 
an approximation. 


8.9.1 Erlang's Loss System 

A loss system is a queueing system in which arrivals that find all servers busy do not 
enter but rather are lost to the system. The simplest such system is the M/M/k loss 
system in which customers arrive according to a Poisson process having rate X, enter 
the system if at least one of the k servers is free, and then spend an exponential amount 
of time with rate /i being served. The balance equations for this system are 


State 

0 

1 

2 

i, 0 < i < k 
k 

Rewriting gives 


Rate leave 
XP 0 
(X + n)P[ 
(X + 2/x)P2 
(A. + ifJ.)Pi 
k[iP k 


rate enter 
[iPl 

2/xP2 + kPo 
3fiPj, + A Pj 
(i + l)/xP,q_l + XPj- i 
^Pk -1 


XPq — ii P \, 
XP\ — 2/iP2, 
XPj = 3/j-Pj, 


XPk -1 = kjiPk 
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or 


Pi = -Po, 

p 

X ( X/fi ) 2 

P 3 = ^=^0, 

i - 3! u 


X (X/ti) k 

P k = —Pk-x = ~^—P 0 
kfi k\ 


and using Yo Pi = 1» we obtain 


Pi = 


(X/nY/il 


E}=0 WnV/j'- 


i = 0,1,..., k 


Since £[S] = 1 //z. where £[S] is the mean service time, the preceding can be written as 


(XE[S}y/H 

Ej=o 


i = 0,1, ... r k 


(8.60) 


Consider now the same system except that the service distribution is general—that 
is, consider the M/G/k with no queue allowed. This model is sometimes called the 
Erlang loss system. It can be shown (though the proof is advanced) that Equation 
(8.60) (which is called Erlang’s loss formula) remains valid for this more general 
system. 

Remark It is easy to see that Equation (8.60) is valid when k = 1. For in this case, 
L — P\,W — £[S], and X a = XPq. Using that L = X a W gives 


Pi = AP 0 £[S] 


which implies, since Pq + P\ = 1, that 


A) 


1 

1 +XE[S]’ 


XE[S] 

l+XE[S] 










544 


Introduction to Probability Models 


8.9.2 The M/M/k Queue 

The M/M/k infinite capacity queue can be analyzed by the balance equation technique. 
We leave it for you to verify that 

(VaO 1 

_ i\ _ 

k ~ l (X/fx) 1 (X/u) k k[i 

p i = ;to i'- k '■ k/x-X 

(X/ku) l k k 
k\ °’ 

We see from the preceding that we need to impose the condition X < k/x. 


i ^ k 


i > k 


8.9.3 The G/M/k Queue 


In this model we again suppose that there are k servers, each of whom serves at an 
exponential rate /x. However, we now allow the time between successive arrivals to 
have an arbitrary distribution G. To ensure that a steady-state (or limiting) distribution 
exists, we assume the condition I/ug < kfx where ilq is the mean of G* 

The analysis for this model is similar to that presented in Section 8.7 for the case 
k = 1. Namely, to avoid having to keep track of the time since the last arrival, we look 
at the system only at arrival epochs. Once again, if we define X n as the number in the 
system at the moment of the nth arrival, then (X n , n 5s 0} is a Markov chain. 

To derive the transition probabilities of the Markov chain, it helps to first note the 
relationship 

X n +i = X n + 1 — Y n , n y 0 

where Y n denotes the number of departures during the interarrival time between the nth 
and (n + 1) st arrival. The transition probabilities Pij can now be calculated as follows: 

Case 1 j > i + 1. 

In this case it easily follows that P U = 0. 

Case 2 j ^ i + 1 < k. 

In this case if an arrival finds i in the system, then as i < k the new arrival will also 
immediately enter service. Hence, the next arrival will find j if of the i + 1 services 
exactly i + 1 — j are completed during the interarrival time. Conditioning on the length 
of this interarrival time yields 


Pij = P{i + 1 — j of i + 1 services are completed in an interarrival time} 


-[ 

-L 


P{i + 1 — j of i + 1 are completed | interarrival time is t } dG(t) 


OO / • . 1 

l + 1 

j 


(i - e~^’) 


-lity + l-j , ■- 


(e-^y dG(t) 


* It follows from the renewal theory (Proposition 7.1) that customers arrive at rate 1 jiiQ, and as the 
maximum service rate is k/i, we clearly need that 1 //iq < k/i for limiting probabilities to exist. 
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where the last equality follows since the number of service completions in a time t will 
have a binomial distribution. 

Case 3 i + 1 ^ j > k. 

To evaluate Pn in this case we first note that when all servers are busy, the departure 
process is a Poisson process with rate k/i (why?). Hence, again conditioning on the 
interarrival time we have 


P u = P{i + 1 - 

"OO 

P{i 


-f 
= f 


c-kpf 


j departures} 

- 1 — j departures in time r} dG(t ) 


a + 1 - jy. 


dG{t) 


Case 4 i + 1 > k > j. 

In this case since when all servers are busy the departure process is a Poisson process, 
it follows that the length of time until there will only be k in the system will have a 
gamma distribution with parameters i + 1 — k, k/j. (the time until i + 1 — k events of a 
Poisson process with rate k/i occur is gamma distributed with parameters i + l—k, k/i). 
Conditioning first on the interarrival time and then on the time until there are only k in 
the system (call this latter random variable 7*) yields 


PiJ = 


POO 

/ P{i + 1 — j departures in time t\ dG(t) 

Jo 

POO Pt 

/ / P{i + 1 — j departures in t \ T& = s}k/ie~ k ^ s 

Jo Jo 

rroe-' 




Y~ J (e~^) ] kpe 


(k^ s y- k 

a-ky. 

(kfisy- k 

a - ky. 


ds dG(t ) 
dsdG(t) 


where the last equality follows since of the k people in service at time .v the number 
whose service will end by time t is binomial with parameters k and 1 — 

We now can verify either by a direct substitution into the equations ttj = P\j , 
or by the same argument as presented in the remark at the end of Section 8.7, that the 
limiting probabilities of this Markov chain are of the form 


7Tk-l+j=cP J , j = 0,1,.... 


Substitution into any of the equations tt j = JT itj P, ; when j > k yields that ft is 
given as the solution of 

POO 

P= / dG(f) 

Jo 

The values ttq, it\, ..., 7tk-2 can be obtained by recursively solving the first k — 1 of 
the steady-state equations, and c can then be computed by using jr; = 1 . 
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If we let Wq denote the amount of time that a customer spends in queue, then in 
exactly the same manner as in G /M /1 we can show that 


0, with probability 1 7r; = 1 — 

Exp(/c/Lt(l — /6)), with probability 7T; = 



where Exp(&/r(l — /3)) is an exponential random variable with rate k/ii 1 — /3). 

8.9.4 The M/G/k Queue 

In this section we consider the M/G/k system in which customers arrive at a Poisson 
rate X and are served by any of k servers, each of whom has the service distribution G. 
If we attempt to mimic the analysis presented in Section 8.5 for the M/G/l system, 
then we would start with the basic identity 


V = XE[S]Wq + XE[S 2 ]/2 


(8.61) 


and then attempt to derive a second equation relating V and Wq . 

Now if we consider an arbitrary arrival, then we have the following identity: 

work in system when customer arrives 


= k x time customer spends in queue + R 


(8.62) 


where R is the sum of the remaining service times of all other customers in service at 
the moment when our arrival enters service. 

The foregoing follows because while the arrival is waiting in queue, work is being 
processed at a rate k per unit time (since all servers are busy ). Thus, an amount of work 
k x time in queue is processed while he waits in queue. Now, all of this work was present 
when he arrived and in addition the remaining work on those still being served when 
he enters service was also present when he arrived—so we obtain Equation (8.62). 
For an illustration, suppose that there are three servers all of whom are busy when the 
customer arrives. Suppose, in addition, that there are no other customers in the system 
and also that the remaining service times of the three people in service are 3, 6, and 7. 
Hence, the work seen by the arrival is 3 + 6 + 7= 16. Now the arrival will spend 3 
time units in queue, and at the moment he enters service, the remaining times of the 
other two customers are 6 — 3 = 3 and 7 — 3 = 4. Hence, R = 3 + 4 = 7 and as a 
check of Equation (8.62) we see that 16 = 3x3 + 7. 

Taking expectations of Equation (8.62) and using the fact that Poisson arrivals see 
time averages, we obtain 


V = kW Q + £[fl] 


which, along with Equation (8.61), would enable us to solve for Wq if we could compute 
£[/?]. However there is no known method for computing E[R] and in fact, there is no 
known exact formula for Wq. The following approximation for Wq was obtained in 
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Reference 6 by using the foregoing approach and then approximating E[ A 1 ]: 


Wq 


X k E[S 2 ](E[S]) 


k-l 


2(k — !)!(&-A.£[S]) 2 


(msw 

Ln=0 


(XE[S]) k 


r\ 


0 k - !)!(*- 1 £[ 5 ]) 


(8.63) 


The preceding approximation has been shown to be quite close to Wq when the service 
distribution is gamma. It is also exact when G is exponential. 


Exercises 

1. For the M/M/1 queue, compute 

(a) the expected number of arrivals during a service period and 

(b) the probability that no customers arrive during a service period. 

Hint: “Condition.” 

*2. Machines in a factory break down at an exponential rate of six per hour. There is 
a single repairman who fixes machines at an exponential rate of eight per hour. 
The cost incurred in lost production when machines are out of service is $10 per 
hour per machine. What is the average cost rate incurred due to failed machines? 

3. The manager of a market can hire either Mary or Alice. Mary, who gives service 
at an exponential rate of 20 customers per hour, can be hired at a rate of $3 per 
hour. Alice, who gives service at an exponential rate of 30 customers per hour, 
can be hired at a rate of $C per hour. The manager estimates that, on the average, 
each customer’s time is worth $1 per hour and should be accounted for in the 
model. Assume customers arrive at a Poisson rate of 10 per hour 

(a) What is the average cost per hour if Mary is hired? If Alice is hired? 

(b) Find C if the average cost per hour is the same for Mary and Alice. 

4. Suppose that a customer of the M/M/1 system spends the amount of time x > 0 
waiting in queue before entering service. 

(a) Show that, conditional on the preceding, the number of other customers that 
were in the system when the customer arrived is distributed as 1 + P, where 
P is a Poisson random variable with mean 

(b) Let Wq denote the amount of time that an M/M/1 customer spends in queue. 
As a by-product of your analysis in part (a), show that 


P{W* Q ^x} = 


1 — k 

1 — - + -(1 




if x = 0 
if A' > 0 
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5. It follows from Exercise 4 that if, in the M/M/1 model, Wq is the amount of 
time that a customer spends waiting in queue, then 



0, with probability 1 — k/fi 


Exp(/i — 1), with probability /,/// 


where Exp(/z — X) is an exponential random variable with rate // — X. Using this, 
find Var (Wq). 

*6. Show that W is smaller in an M/M/1 model having arrivals at rate X and service 
at rate 2fi than it is in a two-server M/M/2 model with arrivals at rate X and with 
each server at rate /i. Can you give an intuitive explanation for this result? Would 
it also be true for Wq! 

7. Consider the M/M/1 queue with impatient customers model as presented in 
Example 8.7 Give your answers in terms of the limiting probabilities P n , n Js 0. 

(a) What is the average amount of time that a customer spends in queue. 

(b) If e n denotes the probability that a customer who finds n others in the system 
upon arrival will be served, find e n , n ^ 0. 

(c) Find the conditional probability that a served customer found n in the system 
upon arrival. That is, find ^(arrival finds n | arrival is served). 

(d) Find the average amount of time spent in queue by a customer that is served. 

(e) Find the average amount of time spent in queue by a customer that departs 
before entering service. 

8. A facility produces items according to a Poisson process with rate X. However, 
it has shelf space for only k items and so it shuts down production whenever k 
items are present. Customers arrive at the facility according to a Poisson process 
with rate /i. Each customer wants one item and will immediately depart either 
with the item or empty handed if there is no item available. 

(a) Find the proportion of customers that go away empty handed. 

(b) Find the average time that an item is on the shelf. 

(c) Find the average number of items on the shelf. 

9. A group of n customers moves around among two servers. Upon completion of 
service, the served customer then joins the queue (or enters service if the server 
is free) at the other server. All service times are exponential with rate [i. Find the 
proportion of time that there are j customers at server 1, j = 0, ..., n. 

10. A group of m customers frequents a single-server station in the following manner. 

When a customer arrives, he or she either enters service if the server is free or 

joins the queue otherwise. Upon completing service the customer departs the 

system, but then returns after an exponential time with rate 6. All service times 

are exponentially distributed with rate ji. 

(a) Find the average rate at which customers enter the station. 

■ Find the average time that a customer spends in the station per visit. 

11. Families arrive at a taxi stand according to a Poisson process with rate X. An 
arriving family finding N other families waiting for a taxi does not wait. Taxis 
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arrive at the taxi stand according to a Poisson process with rate /r. A taxi finding M 
other taxis waiting does not wait. Derive expressions for the following quantities. 

(a) The proportion of time are there no families waiting. 

(b) The proportion of time are there no taxis waiting. 

(c) The average amount of time that a family waits. 

(d) The average amount of time that a taxi waits. 

(e) The fraction of families that take taxis. 

Now redo the problem if we assume that N = M = oo and that each family will 
only wait for an exponential time with rate a before seeking other transportation, 
and each taxi will only wait for an exponential time with rate /3 before departing 
without a fare. 

*12. A supermarket has two exponential checkout counters, each operating at rate //. 
Arrivals are Poisson at rate k. The counters operate in the following way: 

(i) One queue feeds both counters. 

(ii) One counter is operated by a permanent checker and the other by a stock 
clerk who instantaneously begins checking whenever there are two or more 
customers in the system. The clerk returns to stocking whenever he completes 
a service, and there are fewer than two customers in the system. 

(a) Find P n , proportion of time there are n in the system. 

(b) At what rate does the number in the system go from 0 to 1 ? From 2 to 1 ? 

(c) What proportion of time is the stock clerk checking? 

Hint: Be a little careful when there is one in the system. 

13. Two customers move about among three servers. Upon completion of service at 
server i, the customer leaves that server and enters service at whichever of the 
other two servers is free. (Therefore, there are always two busy servers.) If the 
service times at server i are exponential with rate m, i = 1,2,3, what proportion 
of time is server i idle? 

14. Consider a queueing system having two servers and no queue. There are two types 
of customers. Type 1 customers arrive according to a Poisson process having rate 
ki, and will enter the system if either server is free. The service time of a type 1 
customer is exponential with rate m. Type 2 customers arrive according to a 
Poisson process having rate X. 2 . A type 2 customer requires the simultaneous 
use of both servers; hence, a type 2 arrival will only enter the system if both 
servers are free. The time that it takes (the two servers) to serve a type 2 customer 
is exponential with rate H 2 - Once a service is completed on a customer, that 
customer departs the system. 

(a) Define states to analyze the preceding model. 

(b) Give the balance equations. 

In terms of the solution of the balance equations, find 

(c) the average amount of time an entering customer spends in the system; 

(d) the fraction of served customers that are type 1. 
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15. Consider a sequential-service system consisting of two servers, A and B. Arriving 
customers will enter this system only if server A is free. If a customer does enter, 
then he is immediately served by server A. When his service by A is completed, he 
then goes to B if B is free, or if B is busy, he leaves the system. Upon completion 
of service at server B, the customer departs. Assume that the (Poisson) arrival 
rate is two customers an hour, and that A and B serve at respective (exponential) 
rates of four and two customers an hour. 

(a) What proportion of customers enter the system? 

(b) What proportion of entering customers receive service from B? 

(c) What is the average number of customers in the system? 

(d) What is the average amount of time that an entering customer spends in the 
system? 

16. Customers arrive at a two-server system according to a Poisson process having 
rate X — 5. An arrival finding server 1 free will begin service with that server. An 
arrival finding server 1 busy and server 2 free will enter service with server 2. An 
arrival finding both servers busy goes away. Once a customer is served by either 
server, he departs the system. The service times at server i are exponential with 
rates /r,-, where m = 4, yu -2 = 2. 

(a) What is the average time an entering customer spends in the system? 

(b) What proportion of time is server 2 busy? 

17. Customers arrive at a two-server station in accordance with a Poisson process 
with a rate of two per hour. Arrivals finding server 1 free begin service with that 
server. Arrivals finding server 1 busy and server 2 free begin service with server 2. 
Arrivals finding both servers busy are lost. When a customer is served by server 1, 
she then either enters service with server 2 if 2 is free or departs the system if 2 is 
busy. A customer completing service at server 2 departs the system. The service 
times at server 1 and server 2 are exponential random variables with respective 
rates of four and six per hour. 

(a) What fraction of customers do not enter the system? 

(b) What is the average amount of time that an entering customer spends in the 
system? 

(c) What fraction of entering customers receives service from server 1? 

18. Arrivals to a three-server system are according to a Poisson process with rate X. 
Arrivals finding server 1 free enter service with 1. Arrivals finding 1 busy but 2 
free enter service with 2. Arrivals finding both 1 and 2 busy do not join the system. 
After completion of service at either 1 or 2 the customer will then either go to 
server 3 if 3 is free or depart the system if 3 is busy. After service at 3 customers 
depart the system. The service times at; are exponential with rate /i;, i = 1,2,3. 

(a) Define states to analyze the above system. 

(b) Give the balance equations. 
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(c) In terms of the solution of the balance equations, what is the average time 
that an entering customer spends in the system? 

(d) Find the probability that a customer who arrives when the system is empty is 
served by server 3. 

19. The economy alternates between good and bad periods. During good times cus¬ 
tomers arrive at a certain single-server queueing system in accordance with a 
Poisson process with rate X i, and during bad times they arrive in accordance with 
a Poisson process with rate X 2 . A good time period lasts for an exponentially 
distributed time with rate a\, and a bad time period lasts for an exponential time 
with rate 012 - An arriving customer will only enter the queueing system if the 
server is free; an arrival finding the server busy goes away. All service times are 
exponential with rate fi. 

(a) Define states so as to be able to analyze this system. 

(b) Give a set of linear equations whose solution will yield the long-run proportion 
of time the system is in each state. 

In terms of the solutions of the equations in part (b), 

(c) what proportion of time is the system empty? 

(d) what is the average rate at which customers enter the system? 

20. There are two types of customers. Type 1 and 2 customers arrive in accordance 
with independent Poisson processes with respective rate k\ and at. There are two 
servers. A type 1 arrival will enter service with server 1 if that server is free; if 
server 1 is busy and server 2 is free, then the type 1 arrival will enter service with 
server 2. If both servers are busy, then the type 1 arrival will go away. A type 2 
customer can only be served by server 2; if server 2 is free when a type 2 customer 
arrives, then the customer enters service with that server. If server 2 is busy when 
a type 2 arrives, then that customer goes away. Once a customer is served by 
either server, he departs the system. Service times at server i are exponential with 
rate //,■, i = 1, 2. 

Suppose we want to find the average number of customers in the system. 

(a) Define states. 

(b) Give the balance equations. Do not attempt to solve them. 

In terms of the long-run probabilities, what is 

(c) the average number of customers in the system? 

(d) the average time a customer spends in the system? 

*21. Suppose in Exercise 20 we want to find out the proportion of time there is a type 
1 customer with server 2. In terms of the long-run probabilities given in Exercise 
20, what is 

(a) the rate at which a type 1 customer enters service with server 2? 

(b) the rate at which a type 2 customer enters service with server 2? 

(c) the fraction of server 2’s customers that are type 1? 

(d) the proportion of time that a type 1 customer is with server 2? 
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22. Customers arrive at a single-server station in accordance with a Poisson process 
with rate X. All arrivals that find the server free immediately enter service. All 
service times are exponentially distributed with rate //. An arrival that finds the 
server busy will leave the system and roam around “in orbit” for an exponential 
time with rate 0 at which time it will then return. If the server is busy when an 
orbiting customer returns, then that customer returns to orbit for another expo¬ 
nential time with rate 0 before returning again. An arrival that finds the server 
busy and N other customers in orbit will depart and not return. That is, N is the 
maximum number of customers in orbit. 

(a) Define states. 

(b) Give the balance equations. 

In terms of the solution of the balance equations, find 

(c) the proportion of all customers that are eventually served; 

(d) the average time that a served customer spends waiting in orbit. 

23. Consider the M/M/ 1 system in which customers arrive at rate X and the server 
serves at rate ji. However, suppose that in any interval of length h in which the 
server is busy there is a probability ah + o(h) that the server will experience a 
breakdown, which causes the system to shut down. All customers that are in the 
system depart, and no additional arrivals are allowed to enter until the breakdown 
is fixed. The time to fix a breakdown is exponentially distributed with rate /3. 

(a) Define appropriate states. 

(b) Give the balance equations. 

In terms of the long-run probabilities, 

(c) what is the average amount of time that an entering customer spends in the 
system? 

(d) what proportion of entering customers complete their service? 

(e) what proportion of customers arrive during a breakdown? 

*24. Reconsider Exercise 23, but this time suppose that a customer that is in the system 
when a breakdown occurs remains there while the server is being fixed. In addition, 
suppose that new arrivals during a breakdown period are allowed to enter the 
system. What is the average time a customer spends in the system? 

25. Poisson (X) arrivals join a queue in front of two parallel servers A and B, having 
exponential service rates /i,.\ and fig (see Figure 8.4). When the system is empty, 
arrivals go into server A with probability a and into B with probability 1 — a. 
Otherwise, the head of the queue takes the first free server. 

(a) Define states and set up the balance equations. Do not solve. 


m 


e 


Figure 8.4 
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(b) In terms of the probabilities in part (a), what is the average number in the 
system? Average number of servers idle? 

(c) In terms of the probabilities in part (a), what is the probability that an arbitrary 
arrival will get serviced in A? 

26. In a queue with unlimited waiting space, arrivals are Poisson (parameter X) and 
service times are exponentially distributed (parameter /x). However, the server 
waits until K people are present before beginning service on the first customer; 
thereafter, he services one at a time until all K units, and all subsequent arrivals, 
are serviced. The server is then “idle” until K new arrivals have occurred. 

(a) Define an appropriate state space, draw the transition diagram, and set up the 
balance equations. 

(b) In terms of the limiting probabilities, what is the average time a customer 
spends in queue? 

(c) What conditions on X and /x are necessary? 

27. Consider a single-server exponential system in which ordinary customers arrive 
at a rate X and have service rate /x. In addition, there is a special customer who 
has a service rate p\. Whenever this special customer arrives, she goes directly 
into service (if anyone else is in service, then this person is bumped back into 
queue). When the special customer is not being serviced, she spends an exponen¬ 
tial amount of time (with mean 1 /9) out of the system. 

(a) What is the average arrival rate of the special customer? 

(b) Define an appropriate state space and set up balance equations. 

(c) Find the probability that an ordinary customer is bumped n times. 

*28. Let D denote the time between successive departures in a stationary M/M /1 
queue with X < /x. Show, by conditioning on whether or not a departure has left 
the system empty, that D is exponential with rate X. 

Hint: By conditioning on whether or not the departure has left the system empty 
we see that 


D = 


Exponential(/x), 

Exponential (X) * Exponential(/x), 


with probability X/p. 
with probability 1 — X/p 


where Exponential (X) * Exponential(/x) represents the sum of two independent 
exponential random variables having rates p and X. Now use moment-generating 
functions to show that D has the required distribution. 

Note that the preceding does not prove that the departure process is Poisson. To 
prove this we need show not only that the interdeparture times are all exponential 
with rate X, but also that they are independent. 

29. Potential customers arrive to a single-server hair salon according to a Poisson 
process with rate X. A potential customer who finds the server free enters the 
system; a potential customer who finds the server busy goes away. Each potential 
customer is type i with probability pi , where pi+ P 2 + Pi = 1 ■ Type 1 customers 
have their hair washed by the server; type 2 customers have their hair cut by the 
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server; and type 3 customers have their hair first washed and then cut by the server. 
The time that it takes the server to wash hair is exponentially distributed with rate 
fi\, and the time that it takes the server to cut hair is exponentially distributed 
with rate m. 

(a) Explain how this system can be analyzed with four states. 

(b) Give the equations whose solution yields the proportion of time the system 
is in each state. 

In terms of the solution of the equations of (b), find 

(c) the proportion of time the server is cutting hair; 

(d) the average arrival rate of entering customers. 

30. For the tandem queue model verify that 

Pn.m = (VMl)"(l - k/ Ml )(k/ M 2) m (I - k/H2) 
satisfies the balance Equation (8.15). 

31. Consider a network of three stations with a single server at each station. Customers 
arrive at stations 1,2, 3 in accordance with Poisson processes having respective 
rates 5, 10, and 15. The service times at the three stations are exponential with 
respective rates 10, 50, and 100. A customer completing service at station 1 is 
equally likely to (i) go to station 2, (ii) go to station 3, or (iii) leave the system. 
A customer departing service at station 2 always goes to station 3. A departure 
from service at station 3 is equally likely to either go to station 2 or leave the 
system. 

(a) What is the average number of customers in the system (consisting of all three 
stations)? 

(b) What is the average time a customer spends in the system? 

32. Consider a closed queueing network consisting of two customers moving among 
two servers, and suppose that after each service completion the customer is equally 
likely to go to either server—that is, P\ 2 = Pi ,l =\- Let //, denote the expo¬ 
nential service rate at server i, i — 1,2. 

(a) Determine the average number of customers at each server. 

(b) Determine the service completion rate for each server. 

33. Explain how a Markov chain Monte Carlo simulation using the Gibbs sampler 
can be utilized to estimate 

(a) the distribution of the amount of time spent at server j on a visit. 

Hint: Use the arrival theorem. 

(b) the proportion of time a customer is with server j (i.e., either in server j’s 
queue or in service with j). 
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34. For open queueing networks 

(a) state and prove the equivalent of the arrival theorem; 

(b) derive an expression for the average amount of time a customer spends waiting 
in queues. 

35. Customers arrive at a single-server station in accordance with a Poisson process 
having rate X. Each customer has a value. The successive values of customers are 
independent and come from a uniform distribution on (0, 1). The service time of 
a customer having value x is a random variable with mean 3 + 4x and variance 5. 

(a) What is the average time a customer spends in the system? 

(b) What is the average time a customer having value x spends in the system? 

*36. Compare the M/G /1 system for first-come, first-served queue discipline with 
one of last-come, first-served (for instance, in which units for service are taken 
from the top of a stack). Would you think that the queue size, waiting time, and 
busy-period distribution differ? What about their means? What if the queue dis¬ 
cipline was always to choose at random among those waiting? Intuitively, which 
discipline would result in the smallest variance in the waiting time distribution? 

37. In an M/G /1 queue, 

(a) what proportion of departures leave behind 0 work? 

(b) what is the average work in the system as seen by a departure? 

38. For the M/G /1 queue, let X n denote the number in the system left behind by the 
nth departure. 

(a) If 


Xn+i — 


X n ~ 1 + Y n , 
Y n , 


if X n > 1 
if X n = 0 


what does Y„ represent? 
(b) Rewrite the preceding as 


Xn-\-i — X n — 1 + Y n + <5„ 


where 


S 


n 


1, if X n — 0 
0, if X n > 1 


Take expectations and let n —» oo in Equation (8.64) to obtain 


£[<5oo] = 1 - A.E[S] 


(8.64) 


(c) Square both sides of Equation (8.64), take expectations, and then let n —»■ oo 
to obtain 


£[*oo] 


k 2 £[5 2 ] 

2(1 - AE(S]) 


+ 
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(d) Argue that the average number as seen by a departure, is equal to L. 


*39. Consider an M/G/l system in which the first customer in a busy period has 
the service distribution G i and all others have distribution G 2 - Let C denote the 
number of customers in a busy period, and let S denote the service time of a 
customer chosen at random. 

Argue that 

(a) a 0 = P 0 =l-^£[5]. 

(b) /-’[.S’] = «qL(.S'i] + (1 — ao)E{Si\ where Si has distribution G;. 

(c) Use (a) and (b) to show that /-’ [ //], the expected length of a busy period, is 
given by 


£[£] = 


E[S i] 

1 - L£[5 2 ] 


(d) Find E[C]. 


40. Consider a M/G/l system with L£[S] < 1. 

(a) Suppose that service is about to begin at a moment when there are n customers 
in the system. 

(i) Argue that the additional time until there are only n — 1 customers in the 
system has the same distribution as a busy period. 

(ii) What is the expected additional time until the system is empty? 

(b) Suppose that the work in the system at some moment is A. We are interested 
in the expected additional time until the system is empty—call it E[T], Let 
N denote the number of arrivals during the first A units of time. 

(i) Compute f_’[7’|/V]. 

(ii) Compute E[T]. 

41. Carloads of customers arrive at a single-server station in accordance with a Pois¬ 

son process with rate 4 per hour. The service times are exponentially distributed 
with rate 20 per hour. If each carload contains either 1, 2, or 3 customers with 
respective probabilities \ , and \, compute the average customer delay in queue. 

42. In the two-class priority queueing model of Section 8.6.2, what is Wg? Show that 
Wq is less than it would be under FIFO if E\S\ ] < /’[.S/] and greater than under 
FIFO if £[Si] > £[£>]. 

43. In a two-class priority queueing model suppose that a cost of C; per unit time is 
incurred for each type i customer that waits in queue, i — 1,2. Show that type 1 
customers should be given priority over type 2 (as opposed to the reverse) if 


E[S i] 
Ci 


E[S 2 ] 

c 2 


44. Consider the priority queueing model of Section 8.6.2 but now suppose that if a 
type 2 customer is being served when a type 1 arrives then the type 2 customer 
is bumped out of service. This is called the preemptive case. Suppose that when 
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a bumped type 2 customer goes back in service his service begins at the point 
where it left off when he was bumped. 

(a) Argue that the work in the system at any time is the same as in the non- 
preemptive case. 

(b) Derive Wg. 

Hint: How do type 2 customers affect type Is? 

(c) Why is it not true that 

= X 2 E[S 2 \W 2 Q 

(d) Argue that the work seen by a type 2 arrival is the same as in the nonpreemptive 
case, and so 

Wg = Wg (nonpreemptive) + £ [extra time] 

where the extra time is due to the fact that he may be bumped. 

(e) Let N denote the number of times a type 2 customer is bumped. Why is 

NE[Si] 

£[extra timel N] = - 

l-A.i£[Si] 

Hint: When a type 2 is bumped, relate the time until he gets back in service 
to a “busy period.” 

(f) Let S 2 denote the service time of a type 2. What is L[iV|.S' 2 ]? 

(g) Combine the preceding to obtain 


Wg = Wg (nonpreemptive) + 


A.i£[5i]£[5 2 ] 

l-*i£[St] 


*45. Calculate explicitly (not in terms of limiting probabilities) the average time a 
customer spends in the system in Exercise 24. 

46. In the G/M / 1 model if G is exponential with rate X show that fi = X/fi. 

47. In the k server Erlang loss model, suppose that X = I and £[S] = 4. Find L if 
Pk = .2. 

48. Verify the formula given for the Pj of the M/M/k. 

49. In the Erlang loss system suppose the Poisson arrival rate is X — 2, and suppose 
there are three servers, each of whom has a service distribution that is uniformly 
distributed over (0, 2). What proportion of potential customers is lost? 

50. In the M/M/k system, 

(a) what is the probability that a customer will have to wait in queue? 

(b) determine L and W. 

51. Verify the formula for the distribution of Wg given for the G/M/k model. 
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*52. Consider a system where the interarrival times have an arbitrary distribution F, 
and there is a single server whose service distribution is G. Let D n denote the 
amount of time the /?th customer spends waiting in queue. Interpret S n , T n so that 


rL+i — 


Dn + S n — T n , 
0 , 


if D n + S n — T n ^ 0 
if D n + S n - T n < 0 


53. Consider a model in which the interarrival times have an arbitrary distribution F, 
and there are k servers each having service distribution G. What condition on F 
and G do you think would be necessary for there to exist limiting probabilities? 
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Reliability Theory 




9.1 Introduction 

Reliability theory is concerned with determining the probability that a system, possibly 
consisting of many components, will function. We shall suppose that whether or not 
the system functions is determined solely from a knowledge of which components 
are functioning. For instance, a series system will function if and only if all of its 
components are functioning, while a parallel system will function if and only if at 
least one of its components is functioning. In Section 9.2, we explore the possible 
ways in which the functioning of the system may depend upon the functioning of its 
components. In Section 9.3, we suppose that each component will function with some 
known probability (independently of each other) and show how to obtain the probability 
that the system will function. As this probability often is difficult to explicitly compute, 
we also present useful upper and lower bounds in Section 9.4. In Section 9.5 we 
look at a system dynamically over time by supposing that each component initially 
functions and does so for a random length of time at which it fails. We then discuss 
the relationship between the distribution of the amount of time that a system functions 
and the distributions of the component lifetimes. In particular, it turns out that if the 
amount of time that a component functions has an increasing failure rate on the average 
(IFRA) distribution, then so does the distribution of system lifetime. In Section 9.6 we 
consider the problem of obtaining the mean lifetime of a system. In the final section 
we analyze the system when failed components are subjected to repair. 
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1 2 


n 


Figure 9.1 A series system. 


9.2 Structure Functions 


Consider a system consisting of n components, and suppose that each component 
is either functioning or has failed. To indicate whether or not the /th component is 
functioning, we define the indicator variable Xj by 


Xi 


1, if the /th component is functioning 
0, if the /th component has failed 


The vector x = (x \,..., x n ) is called the state vector. It indicates which of the com¬ 
ponents are functioning and which have failed. 

We further suppose that whether or not the system as a whole is functioning is 
completely determined by the state vector x. Specifically, it is supposed that there 
exists a function f (x) such that 

1, if the system is functioning when the state vector is x 
v 0, if the system has failed when the state vector is x 


The function 0 (x) is called the structure function of the system. 

Example 9.1 (The Series Structure) A series system functions if and only if all of 
its components are functioning. Hence, its structure function is given by 


n 

</>(x) = min(xi,..., x n ) = ]~[ x, 

i=i 


We shall find it useful to represent the structure of a system in terms of a diagram. 
The relevant diagram for the series structure is shown in Figure 9.1. The idea is that 
if a signal is initiated at the left end of the diagram then in order for it to successfully 
reach the right end, it must pass through all of the components; hence, they must all be 
functioning. ■ 

Example 9.2 (The Parallel Structure) A parallel system functions if and only if at 
least one of its components is functioning. Hence, its structure function is given by 


</>(x) = max(xi, ..., x „) 


A parallel structure may be pictorially illustrated by Figure 9.2. This follows since 
a signal at the left end can successfully reach the right end as long as at least one 
component is functioning. ■ 

Example 9.3 (The A>out-of-« Structure) The series and parallel systems are both 
special cases of a £>out-of-n system. Such a system functions if and only if at least 
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1 



Figure 9.2 A parallel system. 


1 2 



2 3 



1 3 



Figure 9.3 A two-out-of-three system. 


k of the n components are functioning. As x ‘ ec l ua l s the number of functioning 
components, the structure function of a Ai-out-of-n system is given by 


</>(x) 


I. 
0 , 


if Y. x i^ k 
1 = 1 

if x, < k 
1=1 


Series and parallel systems are respectively n-out-of-n and 1-out-of-n systems. 

The two-out-of-three system may be diagrammed as shown in Figure 9.3. ■ 

Example 9.4 (A Four-Component Structure) Consider a system consisting of four 
components, and suppose that the system functions if and only if components 1 and 2 
both function and at least one of components 3 and 4 function. Its structure function is 
given by 


0(x) = x\X2 max(x 3 , X4 ) 
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1 2 


3 


Figure 9.4 


Pictorially, the system is as shown in Figure 9.4. A useful identity, easily checked, is 
that for binary variables,* x,-, i = 1 

n 

max(xi,- x„) = 1 - Y[ (1 “ x i ) 

i=l 

When n = 2, this yields 

max(xi, xi) = 1 — (1 — -Vl)(1 - * 2 ) = xi + *2 - * 1*2 

Hence, the structure function in the example may be written as 

</>(x) = XlX2(A3 + X 4 — X3X4) ■ 

It is natural to assume that replacing a failed component by a functioning one will 
never lead to a deterioration of the system. In other words, it is natural to assume that the 
structure function (/) (x) is an increasing function of x, that is, if x i < yiJ = 1 
then r/j (x) ^ <p( y). Such an assumption shall be made in this chapter and the system 
will be called monotone. 

9.2.1 Minimal Path and Minimal Cut Sets 

In this section we show how any system can be represented both as a series arrangement 
of parallel structures and as a parallel arrangement of series structures. As a preliminary, 
we need the following concepts. 

A state vector x is called a path vector if 0 (x) = 1. If, in addition, <p (y) = 0 for all 
y < x, then x is said to be a minimal path vector.** If x is a minimal path vector, then 
the set A = {i: Xj = 1} is called a minimal path set. In other words, a minimal path 
set is a minimal set of components whose functioning ensures the functioning of the 
system. 

Example 9.5 Consider a five-component system whose structure is illustrated by 
Figure 9.5. Its structure function equals 

</>(x) = max(xi, xj) max(.V 3 X 4 ,X 5 ) 

= (Xi + X2 — X | X2 )(X3X4 + X5 — X3X4X5) 


* A binary variable is one that assumes either the value 0 or 1 . 

** We say that y < x if y; x l . i = 1 . n. with y,- < Xj for some i. 






Reliability Theory 


563 



Figure 9.5 


There are four minimal path sets, namely, {1, 3, 4}, {2, 3, 4}, {1,5}, {2, 5}. ■ 

Example 9.6 In a fc-out-of-n system, there are (") minimal path sets, namely, all of 
the sets consisting of exactly k components. ■ 

Let Aj, ..., A s denote the minimal path sets of a given system. We define a/(x), 
the indicator function of the /'th minimal path set, by 

1, if all the components of Aj are functioning 
X — 0 , otherwise 

= n * 

ieAj 


By definition, it follows that the system will function if all the components of at least 
one minimal path set are functioning; that is, if oij(x) = 1 for some j. On the other 
hand, if the system functions, then the set of functioning components must include a 
minimal path set. Therefore, a system will function if and only if all the components of 
at least one minimal path set are functioning. Hence, 




I, 

0 , 


if oij (x) = 1 for some j 
if aj (x) = 0 for all j 


or equivalently. 


0(x) = max aj (x) 
j 

= max ]| X[ 

j 

ieAj 


(9.1) 


Since a j (x) is a series structure function of the components of the /th minimal path 
set, Equation (9.1) expresses an arbitrary system as a parallel arrangement of series 
systems. 

Example 9.7 Consider the system of Example 9.5. Because its minimal path sets are 
A/ = {1, 3, 4}, Aj = {2, 3, 4}, A 3 = {1,5}, and A 4 = {2, 5}, we have by Equation 
(9.1) that 


0 (x) = maxj.Vl.V3X4, X2X3X4, X1X5, X2X5} 

= 1 - (1 - *1X3*4)(1 - X2X3X4HI - X1X 5 )(1 - X2X5) 
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Figure 9.6 



Figure 9.7 The bridge system. 
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Figure 9.8 


You should verify that this equals the value of r/j(x) given in Example 9.5. (Make use 
of the fact that, since x,- equals 0 or 1, xf = x,.) This representation may be pictured 
as shown in Figure 9.6. ■ 

Example 9.8 The system whose structure is as pictured in Figure 9.7 is called the 
bridge system. Its minimal path sets are {1, 4}, {1, 3, 5}, {2, 5}, and {2, 3, 4}. Hence, 
by Equation (9.1), its structure function may be expressed as 

</>(x) = max{xix 4 , X 1 X 3 X 5 , X 2 X 5 , * 2 x 3 x 4 } 

= 1 — (1 — X1X4XI — X1X3X5XI — X2X5XI ~ X2X3X4) 

This representation </>(x) is diagrammed as shown in Figure 9.8. ■ 

A state vector x is called a cut vector if <p (x) = 0. If, in addition, </Xy) = 1 for all 
y > x, then x is said to be a minimal cut vector. If x is a minimal cut vector, then the 
set C = {;: *,- = 0} is called a minimal cut set. In other words, a minimal cut set is a 
minimal set of components whose failure ensures the failure of the system. 
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Figure 9.9 


Let Ci,..., Ck denote the minimal cut sets of a given system. We define fij (x), the 
indicator function of the j th minimal cut set, by 

1 , if at least one component of the j th minimal 
cut set is functioning 

' 0 , if all of the components of the /th minimal 

cut set are not functioning 

= max y, 

ieCj 

Since a system is not functioning if and only if all the components of at least one 
minimal cut set are not functioning, it follows that 

k k 

0(x) = n A/AO = n maxx; (9.2) 

7=1 7 = 1 ' S 1 

Since fij (x) is a parallel structure function of the components of the /th minimal cut 
set, Equation (9.2) represents an arbitrary system as a series arrangement of parallel 
systems. 

Example 9.9 The minimal cut sets of the bridge structure shown in Figure 9.9 are 
{1,2}, {1, 3, 5}, {2, 3, 4}, and {4, 5}. Hence, from Equation (9.2), we may express 
0 (x) by 

0 (x) = max(.ri, X 2 ) max(xi, * 3 , X 5 ) maxfe, xj, X 4 ) max(x 4 ,xs) 

= [1 - (1 - *l)(l - *2)][1 - (1 - *l)(l - *3)(1 - *5)] 

X [1 - (1 - x 2 )(l - JC 3 )(1 - jr 4 )][l - (1 - X 4 )(l - * 5 )] 

This representation of </>(x) is pictorially expressed as Figure 9.10. ■ 


9.3 Reliability of Systems of Independent Components 

In this section, we suppose that Xj, the state of the i th component, is a random variable 
such that 


P{Xi = 1} = pi = 1 - P{X, = 0} 
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Figure 9.10 Minimal cut representation of the bridge system. 


The value which equals the probability that the / th component is functioning, is 
called the reliability of the z th component. If we define r by 

r = P{<p(X) = 1}, where X = (X \, .X n ) 

then r is called the reliability of the system. When the components, that is, the random 
variables Xj, i = I..... /z, are independent, we may express r as a function of the 
component reliabilities. That is, 

r = r(p), where p = (pi,, p„) 

The function r(p) is called the reliability function. We shall assume throughout the 
remainder of this chapter that the components are independent. 

Example 9.10 (The Series System) The reliability function of the series system of 
n independent components is given by 

r(p) = P{0(X) = 1} 

= P{Xj — 1 for all i = 1,..., n } 

n 

= n« ■ 

i=i 

Example 9.11 (The Parallel System) The reliability function of the parallel system 
of n independent components is given by 

r(p) = P{0(X) = 1} 

= P{Xj = 1 for some i = 1,...,«} 

= 1 — P{X, = 0 for all i = !,...,«} 
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Example 9.12 (The k-out-of-n System with Equal Probabilities) Consider a A-out- 
of-n system. If pt = p for all i — l,... ,n, then the reliability function is given by 


r(p,...,p) = P{4>(X)= 1} 




Example 9.13 (The Two-out-of-Three System) The reliability function of a two- 
out-of-three system is given by 

r(p) = P{ 0 (X) = 1 } 

= P{X = (1, 1, 1)} + P{X = (1,1,0)} 

+ P{X = (1,0, 1)} + P{X = (0, 1, 1)} 

= P1P2P3 + PlP2(l - P3) + p 1 (1 - P2)P3 + (1 - Pl)P2P3 
= PIP2 + PIP3 + P2P3~2P\P2P3 ■ 

Example 9.14 (The Three-out-of-Four System) The reliability function of a three- 
out-of-four system is given by 

r(p) = P{X = (1, 1, 1, 1)} + P{X = (1, 1, 1, 0)} + P{X =(1,1, 0, 1)} 

+ P{X = (1, 0, 1, 1)} + P{X = (0, 1, 1, 1)} 

= PIP2P3P4 + PlP2P3(l ~ P4) + Pl/?2(1 ~ P3) P4 
+ Pl(l - P2)P3P4 + (1 - P\)P2P3P4 
= PIP2P3 + PIP2P4 + P1P3P4 + P2P3P4 ~ '$P\P2P2P4 ■ 

Example 9.15 (A Five-Component System) Consider a five-component system that 
functions if and only if component 1, component 2, and at least one of the remaining 
components function. Its reliability function is given by 

r(p) = P{Xi = 1, X 2 = 1, max(X 3 , X 4 , X 5 ) = 1} 

= P{X { = 1 }P{X 2 = l}P{max(X 3 , X A , X 5 ) = 1} 

= P l P2 [ 1 - (1 - P3)d - P4)(l - ps)] ■ 

Since <p(X) is a 0-1 (that is, a Bernoulli) random variable, we may also compute 
r (p) by taking its expectation. That is, 


r( P) = P{0(X) = 1} 


= £[0(X)] 

Example 9.16 (A Four-Component System) A four-component system that func¬ 
tions when both components 1 and 4, and at least one of the other components function 
has its structure function given by 


0 (x) = x\X 4 max(x 2 , x 3 ) 
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Hence, 

r(p) = £[0(X)] 

= E[XM1 - (1 - X 2 )(l - X 3 »] 

= /7 1 p 4 [l-d -P2)(l-P3)] ■ 

An important and intuitive property of the reliability function r (p) is given by the 
following proposition. 

Proposition 9.1 If r(p) is the reliability function of a system of independent compo¬ 
nents, then r(p) is an increasing function of p. 

Proof. By conditioning on X; and using the independence of the components, we 
obtain 

r ( P ) = £[0(X)] 

= PiE[<P(X) I X; = 1 ] + (1 - Pi)E[(P(X) I X,' = 0 ] 

= PiEmh, X)] + (1 - pi)£[0(Oi, X)] 


where 

(l ( -,X) = (X 1 ,...,X i _ 1 ,l,X i+1 ,...,X„), 

(0i,X) = (Xi,...,Xi_i,0,X i+ i,...,X„) 

Thus, 

r(p) = PiEmh, X) - 0(0;, X)] + E[0(O,, X)] 

However, since (p is an increasing function, it follows that 
E[0(li,X)-0(Oi,X)]>O 

and so the preceding is increasing in pi for all i. Hence, the result is proven. ■ 

Let us now consider the following situation: A system consisting of n different 
components is to be built from a stockpile containing exactly two of each type of 
component. How should we use the stockpile so as to maximize our probability of 
attaining a functioning system? In particular, should we build two separate systems, in 
which case the probability of attaining a functioning one would be 

P {at least one of the two systems function} 

= 1 — P {neither of the systems function} 

= 1 - [(1 - r(p))(l - r(p'))] 

where pi(p'j) is the probability that the first (second) number i component functions; 
or should we build a single system whose ith component functions if at least one of 
the number i components functions? In this latter case, the probability that the system 
will function equals 

r[l - (1 - p)(l - p')] 
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since 1 — (1 — p,-)(l — p^) equals the probability that the /th component in the single 
system will function.* We now show that replication at the component level is more 
effective than replication at the system level. 

Theorem 9.1 For any reliability function r and vectors p, p', 

r[ 1 - (1 - P)d - p')] > 1 - [1 - KP)][1 - Kp')] 

Proof. Let Xi , ... , X n , X' x , ..., X' n be mutually independent 0-1 random variables 
with 


Pi = P{X t = 1}, p\ = P{X\ = 1} 

Since P{max(X,, X ' { ) = 1} = 1 — (1 — p,)( 1 — p■), it follows that 

r[ 1 ~ (1 - P)d - p')l = E (0[max(X, X')]) 

However, by the monotonicity of 0, we have that 0[max(X, X')] is greater than or equal 
to both (j>(X) and r/>(X') and hence is at least as large as max(</;(X), 0 (X')]. Hence, 
from the preceding we have 

r[ 1 - (1 - P)d - p')] > E[ max(0(X), 0(X'))] 

= P{max[0(X),0(X')] = 1} 

= 1-P{0(X) = O,0(X') = O} 

= l-[l-r(p)][l-r(p')] 

where the first equality follows from the fact that max[ 0 (X), 0(X')] is a 0-1 random 
variable and hence its expectation equals the probability that it equals 1 . ■ 

As an illustration of the preceding theorem, suppose that we want to build a series 
system of two different types of components from a stockpile consisting of two of each 
of the kinds of components. Suppose that the reliability of each component is j. If 
we use the stockpile to build two separate systems, then the probability of attaining a 
working system is 



while if we build a single system, replicating components, then the probability of 
attaining a working system is 



Hence, replicating components leads to a higher reliability than replicating systems 
(as, of course, it must by Theorem 9.1). 

* Notation: If x = (jri. x n ), y = (yi ,.... y n ), then xy = (x\y\ -- x n y n ). Also, maxfx, y) = 

(max(*i, vi),.... max(x„, y n )) and min(x, yi = (min(*i. yi), .... min(x„, y „)). 
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Figure 9.11 


9.4 Bounds on the Reliability Function 

Consider the bridge system of Example 9.8, which is represented by Figure 9.11. Using 
the minimal path representation, we have 

</)(x) = 1 - (1 - X\Xa)(\ - X 1 X 3 X 5 XI - X2.*5)(l ~ *2*3*4) 


Hence, 


r{ P) = 1 - £[(1 - X^il - XiX 3 XsXl - X 2 X 5 )(l - X 2 X 3 X 4 )] 

However, since the minimal path sets overlap (that is, they have components in com¬ 
mon), the random variables (1— X1X4), (1—X 1 X 3 X 5 ), (1— X 2 X 3 ), and (1— X 2 X 3 X 4 ) 
are not independent, and thus the expected value of their product is not equal to the 
product of their expected values. Therefore, in order to compute r( p), we must first 
multiply the four random variables and then take the expected value. Doing so, using 
that Xy = Xi , we obtain 

r( p) = E[X { X A + X 2 X 5 + *1*3X5 + X2X3X4 “ X 1*2X3X4 

X1X2X3X5 - X 1 X2X4X5 - X1X3X4X5 - X 2 X 3 X 4 X 5 
+ 2 X l X 2 X 3 X 4 X 5 ] 

= P1P4 + P2P5 + PIP3P5 + P2P3P4 ~ PIP2P3P4 - PIP2P3P3 
- P1P2P4P5 - PIP3P4P5 ~ P2P3P4P5 + 2p l p 2 p 3 p 4 p5 

As can be seen by the preceding example, it is often quite tedious to evaluate r(p), 
and thus it would be useful if we had a simple way of obtaining bounds. We now 
consider two methods for this. 

9.4.1 Method of Inclusion and Exclusion 

The following is a well-known formula for the probability of the union of the events 

E 1 , E 2 , 


P ~ 2 P( -EiEj) + EEE P(EiEjE k ) 



i < j < k 


-■•• + (—1 ) n+l P(EiE 2 ---E n ) 


( 9 . 3 ) 
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A result, not as well known, is the following set of inequalities: 


P 



1=1 



j2p(Ei)-J2 p(E ‘ E J^ 

i i <j 



< E P(E ‘> -EE P ^ E ^ + E E E p ( E i E j E ki 

i i < j i < j < k 

>■■■ 

<■■■ (9-4) 


where the inequality always changes direction as we add an additional term of the 
expansion of P([J" =] Ei). 

Equation (9.3) is usually proven by induction on the number of events. However, 
let us now present another approach that will not only prove Equation (9.3) but also 
establish Inequalities (9.4). 

To begin, define the indicator variables / /, j = 1,..., n, by 


1, if E j occurs 
0 , otherwise 


Letting 


^ = E 7 / 

j =i 


then N denotes the number of the Ej , 1 < j < n, that occur. Also, let 

fl, if iV > 0 
1 - 0, if N = 0 


Then, as 

1 - / = (1 - l) w 


we obtain, upon application of the binomial theorem, that 



I — N 


(9.5) 
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We now make use of the following combinatorial identity (which is easily established 
by induction on i): 



The preceding thus implies that 



From Equations (9.5) 


/ sC N, 

I > N - 

I < (V - 




and (9.6) we obtain 

by letting i — 2 in (9.6) 
by letting i — 3 in (9.6) 



(9.6) 


(9.7) 


and so on. Now, since N ^ n and ('") = 0 whenever i > m, we can rewrite 
Equation (9.5) as 



Equation (9.3) and Inequalities (9.4) now follow upon taking expectations of (9.8) and 
(9.8). This is the case since 


E[I] = P{N > 0} = P{at least one of the Ej occurs} = P Ej 

|- n -i n 

E[N] = E =J2 P(E J^ 

L ;=i J 7=i 


Also, 


E 



= £ [number of pairs of the Ej that occur] 


= E 


EE 7 ' 7 ; 


= EE /, ® £ i) 
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and, in general 


E 



= £ [number of sets of size i that occur] 


E EE V* 


71 <72 <•••<// 


EE P(E J' E n ■ ■ ■ E h) 


h<h<—<h 


The bounds expressed in (9.4) are commonly called the inclusion-exclusion 
bounds. To apply them in order to obtain bounds on the reliability function, let 
A[ A 2 , ..., A s denote the minimal path sets of a given structure (p, and define the 
events E \, E 2 ,..., E s by 

Ej = {all components in A, function} 

Now, since the system functions if and only if at least one of the events £}■ occur, we 
have 



Applying (9.4) yields the desired bounds on r(p). The terms in the summation are 
computed thusly: 


p(Ei ) = n pi , 


leAi 


P(EjEj) = [~[ pi. 


leAjUAj 


P(E, Ej E k ) = Pl 


leAiUAjUAb 


and so forth for intersections of more than three of the events. (The preceding follows 
since, for instance, in order for the event E,E j to occur, all of the components in A, 
and all of them in A j must function; or, in other words, all components in A/ U A y 
must function.) 

When the /?, s are small the probabilities of the intersection of many of the events 
Ej should be quite small and the convergence should be relatively rapid. 

Example 9.17 Consider the bridge structure with identical component probabilities. 
That is, take p to equal p for all i. Letting Ai = {1, 4}, A 2 = {1, 3, 5}, A 3 = {2, 5}, 
and A 4 = {2, 3, 4} denote the minimal path sets, we have 


P{E\) = P(Ej) = p 2 , 
P(E 2 ) = P(E A ) = p 3 
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Also, because exactly five of the six = (^) unions of A, and A j contain four components 
(the exception being A 2 U A 4 , which contains all five components), we have 

P{ExE 2 ) = PiE x E 3 ) = P{E X E A ) = P(E 2 E 3 ) = P(E 3 E 4 ) = p\ 

P(E 2 E 4 ) = p 5 

Hence, the first two inclusion-exclusion bounds yield 
2 (p 2 + p 3 ) - 5 p 4 - p 5 < r(p) < 2(p 2 + p 3 ) 
where rip) = r(p, p , p, p , p). For instance, when p = 0.2, we have 
0.08768 < r(0.2) s; 0.09600 
and, when p = 0 . 1 , 

0.02149 < r(0.1) < 0.02200 ■ 

Just as we can define events in terms of the minimal path sets whose union is the 
event that the system functions, so can we define events in terms of the minimal cut sets 
whose union is the event that the system fails. Let C\, C 2 , ■ ■ ■, C r denote the minimal 
cut sets and define the events F\ ,..., F r by 

Fj = {all components in C, are failed} 

Now, because the system is failed if and only if all of the components of at least one 
minimal cut set are failed, we have 

l-r(p) = p(\Jf^, 

i 

1 - Kp) > P(Fi) P(Ei Ej), 

i i < j 

1 - rip) < £ P{Fi) - Y, E P(F ‘ F j) + E E E p iFiFjF k ), 

i i<j i< j <k 

and so on. As 

p(Fi) = n a - pi), 

leCi 

p(FjFj) = n a -pi), 

leCjUCj 

P(F i F j F k )= f] (1 -pi) 

leCiUCjUCk 

the convergence should be relatively rapid when the p, s are large. 
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Example 9.18 (A Random Graph) Let us recall from Section 3.6.2 that a graph 
consists of a set N of nodes and a set A of pairs of nodes, called arcs. For any two 
nodes i and j we say that the sequence of arcs ( i , h), ■ ■ ■ , (ft, j) constitutes an 

i— j path. If there is an i—j path between all the (") pairs of nodes i and j, i j, then 

the graph is said to be connected. If we think of the nodes of a graph as representing 
geographical locations and the arcs as representing direct communication links between 
the nodes, then the graph will be connected if any two nodes can communicate with 
each other—if not directly, then at least through the use of intermediary nodes. 

A graph can always be subdivided into nonoverlapping connected subgraphs called 
components. For instance, the graph in Figure 9.12 with nodes N = {1, 2, 3, 4, 5, 6} and 
arcs A = {(1,2), (1,3), (2, 3), (4, 5) {consists of three components (a graph consisting 
of a single node is considered to be connected). 

Consider now the random graph having nodes 1,2, ..., n, which is such that there 
is an arc from node i to node j with probability Pjj. Assume in addition that the 
occurrences of these arcs constitute independent events. That is, assume that the (") 
random variables Xij , i ^ j, are independent where 


Xij = 


1 , 

o, 


if (i, j) is an arc 
otherwise 


We are interested in the probability that this graph will be connected. 

We can think of the preceding as being a reliability system of (") components— 
each component corresponding to a potential arc. The component is said to work if the 
corresponding arc is indeed an arc of the network, and the system is said to work if 
the corresponding graph is connected. As the addition of an arc to a connected graph 
cannot disconnect the graph, it follows that the structure so defined is monotone. 

Let us start by determining the minimal path and minimal cut sets. It is easy to see 
that a graph will not be connected if and only if the set of nodes can be partitioned into 
two nonempty subsets X and X c in such a way that there is no arc connecting a node 
from X with one from X c . For instance, if there are six nodes and if there are no arcs 
connecting any of the nodes 1, 2, 3, 4 with either 5 or 6 , then clearly the graph will not 
be connected. Thus, we see that any partition of the nodes into two nonempty subsets 
X and X c corresponds to the minimal cut set defined by 


i e X, j e X c ) 

As there are 2 ,,_1 — 1 such partitions (there are 2" — 2 ways of choosing a nonempty 
proper subset X and, as the partition X, X' is the same as X c , X, we must divide by 2) 
there are therefore this number of minimal cut sets. 
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Figure 9.13 



Figure 9.14 



To determine the minimal path sets, we must characterize a minimal set of arcs that 
results in a connected graph. The graph in Figure 9.13 is connected but it would remain 
connected if any one of the arcs from the cycle shown in Figure 9.14 were removed. In 
fact it is not difficult to see that the minimal path sets are exactly those sets of arcs that 
result in a graph being connected but not having any cycles (a cycle being a path from 
a node to itself). Such sets of arcs are called spanning trees (Figure 9.15). It is easily 
verified that any spanning tree contains exactly n — 1 arcs, and it is a famous result in 
graph theory (due to Cayley) that there are exactly n"~ 2 of these minimal path sets. 

Because of the large number of minimal path and minimal cut sets ( n' l ~ 2 and 
2 "~ 1 — 1 , respectively), it is difficult to obtain any useful bounds without making further 
restrictions. So, let us assume that all the P (/ equal the common value p. That is, we 
suppose that each of the possible arcs exists, independently, with the same probabil¬ 
ity p. We shall start by deriving a recursive formula for the probability that the graph is 
connected, which is computationally useful when n is not too large, and then we shall 
present an asymptotic formula for this probability when n is large. 

Let us denote by P n the probability that the random graph having n nodes is con¬ 
nected. To derive a recursive formula for P n we first concentrate attention on a single 
node—say, node 1 —and try to determine the probability that node 1 will be part of a 
component of size k in the resultant graph. Now, for a given set of k — I other nodes 
these nodes along with node 1 will form a component if 

(i) there are no arcs connecting any of these k nodes with any of the remaining n — k 
nodes; 

(ii) the random graph restricted to these k nodes (and (*) potential arcs—each inde¬ 
pendently appearing with probability p) is connected. 
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The probability that (i) and (ii) both occur is 

q k(n—k) p k 

where q = I — p. As there are J) ways of choosing k — 1 other nodes (to form 
along with node 1 a component of size k ) we see that 

P{node 1 is part of a component of size k \ 

= (l~_ \ \q k(n - k) Pk, k=\,2,...,n 

Now, since the sum of the foregoing probabilities as k ranges from 1 through n clearly 
must equal 1, and as the graph is connected if and only if node 1 is part of a component 
of size n, we see that 


Pn = 



n = 2, 3,... 


(9.9) 


Starting with Pi = 1, Pj = p. Equation (9.9) can be used to determine P„ recur¬ 
sively when n is not too large. It is particularly suited for numerical computation. 

To determine an asymptotic formula for P n when n is large, first note from Equation 
(9.9) that since P* ^ 1, we have 


1 - 



q k(n-k) 


As it can be shown that for q < 1 and n sufficiently large. 


E (l _ i) v Hn ~ k) < (n + d ?" -1 

k=\ ^ ' 

we have that for n large 

1 - P„ < (n + \)q n ~ l (9.10) 

To obtain a bound in the other direction, we concentrate our attention on a particular 
type of minimal cut set—namely, those that separate one node from all others in the 
graph. Specifically, define the minimal cut set C, as 

Ci = {((, j ): j # i] 


and define P,- to be the event that all arcs in C/ are not working (and thus, node i is 
isolated from the other nodes). Now, 

1 — P n = P (graph is not connected) ^ P I |^J P, J 
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since, if any of the events Fj occur, then the graph will be disconnected. By the 
inclusion-exclusion bounds, we have 

p (U F >) > E P(F ‘) - E Y, p ( p ‘ F o 

' i ' I i<j 

As P(Fj) and P(FjFj) are just the respective probabilities thatagiven set of n - 1 arcs 
and a given set of 2 n — 3 arcs are not in the graph (why?), it follows that 

P(Fi) = q n -\ 

P(F i Fj) = q 7 *- 3 , i^j 


and so 


n —1 


n \ q 2n-3 


1 - P n > nq 

Combining this with Equation (9.10) yields that for n sufficiently large, 

n — 1 


nq 

and as 


" V"- 3 < (n + D^- 1 


n\ q 


2n-3 


n —1 


nq 


as n —> oo, we see that, for large n. 


n — 1 


1 — P n & nq 

Thus, for instance, when n = 20 and p = j, the probability that the random graph will 
be connected is approximately given by 


P 2 o « 1 - 200 19 = 0.99998 


9.4.2 Second Method for Obtaining Bounds on r(p) 

Our second approach to obtaining bounds on /-(p) is based on expressing the desired 
probability as the probability of the intersection of events. To do so, let Ai , Aj,..., A s 
denote the minimal path sets as before, and define the events, D,,i — 1, ..., s by 


Dj = {at least one component in A, has failed} 

Now since the system will have failed if and only if at least one component in each of 
the minimal path sets has failed we have 


1-KP) = P(D l D 2 -D s ) 

= P(Di)P(D 2 I £>i)--- P(D S | D\D 2 - ■ D s - 1) 


(9.11) 
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Now it is quite intuitive that the information that at least one component of Aj is down 
can only increase the probability that at least one component of A 2 is down (or else 
leave the probability unchanged if A 1 and A2 do not overlap). Hence, intuitively 

P(D 2 I D\) > P(D 2 ) 

To prove this inequality, we write 

P(D 2 ) = P(D 2 I Di)P(Di) + P(D 2 I £f)(l - P(Di )) (9.12) 

and note that 

P(D 2 I D[) — / J {at least one failed in A 2 | all functioning in Aj} 

= 1 - n pj 

jeA 2 
it A 1 

^ 1 - n pj 

jeA 2 

= P(D 2 ) 

Hence, from Equation (9.12) we see that 

P(D 2 ) ^ P(D 2 I D\)P(D\) + P(D 2 )( 1 - P(Di)) 


or 

P(D 2 I Di) > P(D 2 ) 

By the same reasoning, it also follows that 
P(Dt | £>!■■■ A_i) > P(Di) 
and so from Equation (9. 1 1) we have 

1 ~'-(P) >]\P{ D i) 

i 

or, equivalently, 

'-(p) < 1 - n 6 - n pj ) 

i V jeAi ' 

To obtain a bound in the other direction, let C \,..., C r denote the minimal cut sets 
and define the events Ui,U r by 

Ui = {at least one component in C, is functioning) 
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Then, since the system will function if and only if all of the events {/, occur, we have 


r( P) = P(UiU 2 ---U r ) 

= P(Ul)P(U 2 | Ul) ■ ■ ■ P (Ur I Ul ■ ■ ■ Ur- 1) 

> n p(Ui) 

i 

where the last inequality is established in exactly the same manner as for the D ,. Hence, 


(p) ^ n 1 - n d - p.o 

i 1 jeQ 


and we thus have the following bounds for the reliability function: 


n 1 ~ n (i _ p o ^ r(p) ^ 1 _ n ( 1 _ n p/ 

i L jeCi J i ' jeAj 


(9.13) 


It is to be expected that the upper bound should be close to the actual r (p) if there is 
not too much overlap in the minimal path sets, and the lower bound to be close if there 
is not too much overlap in the minimal cut sets. 

Example 9.19 For the three-out-of-four system the minimal path sets are Ai = 
{1, 2, 3}, A 2 = {1, 2, 4}, A 3 = {1, 3, 4}, and A 4 = {2, 3, 4}; and the minimal cut sets 
are C, = {1, 2 }, C 2 = {1, 3}, C 3 = {1,4}, C 4 = {2, 3}, C 5 = { 2 , 4}, and C 6 = {3, 4}. 
Hence, by Equation (9.13) we have 


(1 - qiq 2 )(l - qiq3)(l - qiq 4 )(l ~ <?2<?3)(1 - <?2<Z4)0 - q 344) 

< r(p) < 1 — (1 — P 1 P 2 P 3 HI - PlP2P4)(\ - PiP3Pa)(\ - P2P3P4) 

where qi = 1 — p, . For instance, if p, — j for all i , then the preceding yields 
0.18 < i') < 0.59 

The exact value for this structure is easily computed to be 



9.5 System Life as a Function of Component Lives 

For a random variable having distribution function G, we define G(a) = 1 — G(a) to 
be the probability that the random variable is greater than a. 

Consider a system in which the f th component functions for a random length of 
time having distribution F, and then fails. Once failed it remains in that state forever. 
Assuming that the individual component lifetimes are independent, how can we express 
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the distribution of system lifetime as a function of the system reliability function r (p) 

and the individual component lifetime distributions Fj , i = 1. nl 

To answer this we first note that the system will function for a length of time t 
or greater if and only if it is still functioning at time t. That is, letting F denote the 
distribution of system lifetime, we have 

F(t) = Pfsystem life > t } 

= P{ system is functioning at time t } 

But, by the definition of r(p) we have 

P{ system is functioning at time t} = r( P\ (t), ..., P n (t)) 

where 


P, (t) = P {component i is functioning at t] 
= P{ lifetime of i > t) 

= ha) 


Hence, we see that 

P(f) = r(P 1 (f),...,P„(0) (9.14) 

Example 9.20 In a series system, r(p) = n"P' anc * so f rom Equation (9.14) 

n 

F(t) = Y\F l (t) 

l 

which is, of course, quite obvious since for a series system the system life is equal to 
the minimum of the component lives and so will be greater than t if and only if all 
component lives are greater than t. ■ 

Example 9.21 In a parallel system r (p) = 1 — n"(l — Pi ) an d so 


n 

F{t) = 1 -Y\Fi(0 
l 


The preceding is also easily derived by noting that, in the case of a parallel system, the 
system life is equal to the maximum of the component lives. ■ 

For a continuous distribution G, we define /At), the failure rate function of G, by 


m = 


g(t) 

G{t) 


where g(t) — d/dtG(t). In Section 5.2.2, it is shown that if G is the distribution of 
the lifetime of an item, then /At) represents the probability intensity that a f-year-old 
item will fail. We say that G is an increasing failure rate (IFR) distribution if ’/At) is 
an increasing function of t. Similarly, we say that G is a decreasing failure rate (DFR) 
distribution if /At) is a decreasing function of t. 
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Example 9.22 (The Weibull Distribution) A random variable is said to have the 
Weibull distribution if its distribution is given, for some X > 0. a > 0. by 

G(t)= l-e“ (Ar) “, f^O 


The failure rate function for a Weibull distribution equals 


m = 


e a(Xt) a ’A. 


= ak{kt) a ~ X 


Thus, the Weibull distribution is IFR when a ^ 1, and DFR when 0 < a ^ 1; when 
a = 1 , G(t) = 1 — e~ Xt , the exponential distribution, which is both IFR and DFR. ■ 

Example 9.23 (The Gamma Distribution) A random variable is said to have a 
gamma distribution if its density is given, for some X > 0 , a > 0 , by 


g(t) = 


Xe- x, (Xt) a ~ l 

r(a) 


for / ^ 0 


where 

r(a) = 


/»oo 

= / e~'t a ~ l 

Jo 


dt 


For the gamma distribution, 

1 G(t) f t °° Xe- lx (Xx) a ~ l dx 


Ht) g(t ) 


Xe- x, (Xt) a ~ l 




e 


-X(x-t) 


/X\ a ~ l 


dx 


With the change of variables u — x — t, we obtain 

1 f°° _i„/ 

- = I e (1 H—) du 

m Jo v t) 


Hence, G is IFR when a ^ 1 and is DFR when 0 < a ^ 1. ■ 

Suppose that the lifetime distribution of each component in a monotone system is 
IFR. Does this imply that the system lifetime is also IFR? To answer this, let us at 
first suppose that each component has the same lifetime distribution, which we denote 
by G. That is, h) (t) = G(t ), i — \..... n. To determine whether the system lifetime 
is IFR, we must compute Xp(t), the failure rate function of F. Now, by definition. 


Xp(t) = 


(. d/dt)F(t ) 

F(t) 

(d/dt)[l-r(G(t))] 


r(G(t )) 
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where 


r(G(0) = r(G(f),...,G(f)) 


Hence, 



G(t)r'(G(t)) G'(t) 
r(G(t)) G(t ) 



(9.15) 


Since G{t) is a decreasing function of t, it follows from Equation (9.15) that if 
each component of a coherent system has the same IFR lifetime distribution, then the 
distribution of system lifetime will be IFR if pr'{p)/r(p) is a decreasing function of p. 

Example 9.24 (The fc-out-of-H System with Identical Components) Consider the 
k-out-of-n system, which will function if and only if k or more components func¬ 
tion. When each component has the same probability p of functioning, the number of 
functioning components will have a binomial distribution with parameters n and p. 
Hence, 



i=k 


which, by continual integration by parts, can be shown to be equal to 



Upon differentiation, we obtain 


(k — l)!(n — k)\ 

Therefore, 

pr'(p) [ r(p) 


P k -\1~P) 


r{p) L pr'ip) 



Letting y — x/p yields 
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Since (1 — yp)/( 1 — p ) is increasing in p, it follows that pr'(p)/r(p) is decreasing 
in p. Thus, if a A-out-of-n system is composed of independent, like components having 
an increasing failure rate, the system itself has an increasing failure rate. ■ 

It turns out, however, that for a A-out-of-n system, in which the independent com¬ 
ponents have different IFR lifetime distributions, the system lifetime need not be IFR. 
Consider the following example of a two-out-of-two (that is, a parallel) system. 

Example 9.25 (A Parallel System That Is Not IFR) The life distribution of a parallel 
system of two independent components, the ith component having an exponential 
distribution with mean 1 /i,i — 1,2, is given by 

F(t) = 1 — (1 — — e~ 2t ) 

= e~ 2t + e~’ - e~ 3t 


Therefore, 


m = 


fit) 
Fit ) 


2e~ 2 ' + e~ l - 3e -3f 
e ~+ e~' — e~ 2t 


It easily follows upon differentiation that the sign of k' it) is determined by e~ 5t — 
e~ 3t + 3e~ 4 ', which is positive for small values and negative for large values of t. 
Therefore, kit) is initially strictly increasing, and then strictly decreasing. Hence, F is 
not IFR. ■ 

Remark The result of the preceding example is quite surprising at first glance. To 
obtain a better feel for it we need the concept of a mixture of distribution functions. 
The distribution function G is said to be a mixture of the distributions G \ and G 2 if for 
some p, 0 < p < 1, 


Gix) = pGiix) + (1 - p)G 2 ix) (9.16) 

Mixtures occur when we sample from a population made up of two distinct groups. 
For example, suppose we have a stockpile of items of which the fraction p are type 1 
and the fraction 1 — p are type 2. Suppose that the lifetime distribution of type 1 items 
is G 1 and of type 2 items is G 2 . If we choose an item at random from the stockpile, 
then its life distribution is as given by Equation (9.16). 

Consider now a mixture of two exponential distributions having rates Ai and A 2 
where A 1 < A 2 . We are interested in determining whether or not this mixture distribution 
is IFR. To do so, we note that if the item selected has survived up to time t , then its 
distribution of remaining life is still a mixture of the two exponential distributions. This 
is so since its remaining life will still be exponential with rate Ai if it is type 1 or with 
rate A 2 if it is a type 2 item. However, the probability that it is a type 1 item is no longer 
the (prior) probability p but is now a conditional probability given that it has survived 
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to time t. In fact, its probability of being a type 1 is 


P{type 1 | life > f} = 


P{type 1, life > t } 
P{life > t } 
pe~ Xl ' 

pe~ kl ‘ + (1 — p)e~ k 2 ‘ 


As the preceding is increasing in t, it follows that the larger t is, the more likely it is 
that the item in use is a type 1 (the better one, since X | < Xf). Hence, the older the 
item is, the less likely it is to fail, and thus the mixture of exponentials far from being 
IFR is, in fact, DFR. 

Now, let us return to the parallel system of two exponential components having 
respective rates X\ and Xj. The lifetime of such a system can be expressed as the sum 
of two independent random variables, namely. 


system life = Exp(ki + X 2 ) + 


Exp (a 1 ) with probability- = — 

A-i + X2 
Xi 

Exp(A. 2 ) with probability- 

Ai + X2 


The first random variable whose distribution is exponential with rate Ai + X 2 represents 
the time until one of the components fails, and the second, which is a mixture of 
exponentials, is the additional time until the other component fails. (Why are these two 
random variables independent?) 

Now, given that the system has survived a time t, it is very unlikely when t is large 
that both components are still functioning, but instead it is far more likely that one 
of the components has failed. Hence, for large t, the distribution of remaining life is 
basically a mixture of two exponentials—and so as t becomes even larger its failure 
rate should decrease (as indeed occurs). ■ 


Recall that the failure rate function of a distribution F(t) having density fit) = 
F'(t ) is defined by 

fit) 

1 - F{t) 

By integrating both sides, we obtain 

fis) 


m = 


[ X(s) ds = f — 

Jo Jo 1 - F(s) 

= - log Fit) 

Hence, 

Fit) = e~ Mt) 


-ds 


(9.17) 


where 

Ait) = f Xis) ds 

Jo 

The function A(r) is called the hazard function of the distribution F. 
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Definition 9.1 A distribution F is said to have increasing failure on the average 
(IFRA) if 


A (t) fik(s)ds 

— = Jo - (9.18) 

t t 

increases in t for t ^ 0. 

In other words. Equation (9.18) states that the average failure rate up to time t 
increases as t increases. It is not difficult to show that if F is IFR, then F is IFRA; but 
the reverse need not be true. 

Note that F is IFRA if A (s)/s f A(t)/t whenever 0 ^ s ^ t, which is equivalent 
to 


A( at) < A (t) 
at ^ t 


for 0 ^ a ^ 1, all t ^ 0 


But by Equation (9.17) we see that A(f) = — log Fit), and so the preceding is equiv¬ 
alent to 


— log F(at) ^ —a log F(t) 
or equivalently, 

log F{at) ^ log F a (t) 

which, since log x is a monotone function of x, shows that F is IFRA if and only if 

F(at)^F a (t) forO <or ^ 1, allf ^ 0 (9.19) 

For a vector p = ip\ , ..., p n ) we define p“ = (pf, ..., p" ). We shall need the 
following proposition. 

Proposition 9.2 Any reliability function r(p) satisfies 
r(p“) > [r(p)]“, O^a^l 

Proof. We prove this by induction on n, the number of components in the system. If 
n — 1, then either rip) = 0, rip) = I. or rip) = p. Hence, the proposition follows 
in this case. 

Assume that Proposition 9.2 is valid for all monotone systems of n — 1 components 
and consider a system of n components having structure function </>. By conditioning 
upon whether or not the nth component is functioning, we obtain 

r(p“) = p“Kl„, p“) + (1 - p“)K0„, p“) (9.20) 

Now consider a system of components 1 through n — 1 having a structure function 
(j)\ (x) = 0 (1 „, x). The reliability function for this system is given by /'] (p) = r( 1 „. p); 
hence, from the induction assumption (valid for all monotone systems of n — 1 com¬ 
ponents), we have 


r(l„,p“) > [r(l„,p)f 
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Similarly, by considering the system of components 1 through n — 1 and structure 
function <po(x ) = (p(0 „, x), we obtain 

r(0„,p“) > [r(0„,p)]“ 

Thus, from Equation (9.20), we obtain 

r(p“) 5* p“[r(l n , P)]“ + (1 - p“)['-(0„, p)f 

which, by using the lemma to follow (with X = p n ,x = r(l„,p), y = r(0„,p)), 
implies that 

r(p“) ^ [ Pn r{ 1„, p) + (1 - Pn)r(0 n , p)]“ 

= [r( p)f 

which proves the result. ■ 

Lemma 9.3 If0^a^l,0^A^l, then 

h(y) = X a x a + (1 - X a )y a - (Xx + (1 - A)y)“ ^ 0 
for all 0 ^ y ^ x. 

Proof. The proof is left as an exercise. ■ 

We are now ready to prove the following important theorem. 

Theorem 9.2 For a monotone system of independent components, if each component 
has an IFRA lifetime distribution, then the distribution of system lifetime is itself IFRA. 

Proof. The distribution of system lifetime F is given by 
F(at) = r{F\(at ),..., F n (at )) 

Hence, since r is a monotone function, and since each of the component distributions 
Fi is IFRA, we obtain from Equation (9.19) 

F(at)^r(F?(t),...,FZ(t)) 

> [r{F x (t),...,F n (t))r 

= F a (t ) 

which by Equation (9.19) proves the theorem. The last inequality followed, of course, 
from Proposition 9.2. ■ 


9.6 Expected System Lifetime 

In this section, we show how the mean lifetime of a system can be determined, at least 
in theory, from a knowledge of the reliability function r(p) and the component lifetime 
distributions F,,i = 1,...,«. 
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Since the system’s lifetime will be t or larger if and only if the system is still 
functioning at time t, we have 

Pfsystem life > t] = r(F(f)) 

where Fit) = ( F\ (f), ..., F n (t)). Hence, by a well-known formula that states that for 
any nonnegative random variable X, 

pCO 

£[X] = / P{X > x}dx, 

Jo 

we see that* 

roo 

/^system life] = / r (F(/)) dt (9.21) 

Jo 


Example 9.26 (A Series System of Uniformly Distributed Components) Consider 
a series system of three independent components each of which functions for an amount 
of time (in hours) uniformly distributed over (0, 10). Hence, r(p) = p\p 2 P 3 and 


Fi(t) = 
Therefore, 

r(F(0) = 


f/10, 0^1^10 . 


1 , 


t > 10 


i = 1,2,3 


/-’[system life] = 


)-t\ 

5 

10 ) 

, 0< t ^ 10 


t > 10 

(9.21) 

we obtain 

/•10 

/10 — /\ 3 

/ 

dt 

Jo 

v io y 


-10 /' 


y 3 dy 


Example 9.27 (A Two-out-of-Three System) Consider a two-out-of-three system of 
independent components, in which each component’s lifetime is (in months) uniformly 
distributed over (0, 1). As was shown in Example 9.13, the reliability of such a system 
is given by 

r(p) = PIP2 + PiP3 + P2P3 - 2pip 2 p3 

* That E[X] = P{X > x}dx can be shown as follows when X has density /: 

poo poo poo poo py poo 

/ P{X > x}dx = / / /(y) dy dx = / / f(y)dxdy= / yf(y) dy = E[X] 

J 0 JO Jx Jo Jo Jo 
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Since 

M) = 

we see from Equation (9.21) that 


0 < 1 y 1 

t > 1 




£ [system life] = / [3(1 - 1) - 2(1 


t) 3 ]dt 


-f 


= 1~4 


{3y-2y i )dy 


Example 9.28 (A Four-Component System) Consider the four-component system 
that functions when components 1 and 2 and at least one of components 3 and 4 
functions. Its structure function is given by 


0(x) = X\X2(xs + X4 — X 3 X 4 ) 


and thus its reliability function equals 
r(P) = PlP2(P3 + PA - P3P4) 

Let us compute the mean system lifetime when the i th component is uniformly dis¬ 
tributed over (0, i), i — 1, 2, 3, 4. Now, 


F\ (0 = 

1 - 
0 , 

t , 0 ^ t ^ 1 

t > 1 

hit) = 

1 - 
0 , 

1 / 2 , 

0 < 1 < 2 
t > 2 

h(t) = 

1 - 
0 , 

1/3, 

0 < 1 < 3 
i > 3 

hit) = 

1 - 
0 , 

1/4. 

0 < 1 < 4 

1 > 4 

Hence, 

' 


/2-i\r3 


'(F( 0 ) = 


(1 

0 , 


0 


4 -1 
4 


(3-0(4-0 

12 


0 ^ t y 1 
t > 1 


Therefore, 

1 

£ [system life] = — (1 - 0(2 - t )(12 - t 2 ) dt 

24 Jo 

593 

“ (24)(60) 

w 0.41 ■ 
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We end this section by obtaining the mean lifetime of a A-out-of-/; system of inde¬ 
pendent identically distributed exponential components. If 9 is the mean lifetime of 
each component, then 

W) = e-‘l e 


Hence, since for a A-out-of-n system, 

r(p , p,...,p) = J2 (/) P l(] ~ P^ n ~‘ 


i=k 


we obtain from Equation (9.21) 


//[system life] = J ^ (e t i e )'(\ — 


e- ,/e ) n -‘dt 


Making the substitution 


y — e l/,e , dy = - e , i e dt = —'—dt 

* e e 


yields 


//[system life] = 0 ^ J y l 1 (1 - y) n ‘dy 


Now, it is not difficult to show that* 

/ y\\-y) m dy= '■ 

Jo (in + n + 1) 

Thus, the foregoing equals 


/-’[system life] = 9 --- 

f—f (n — /)! /! 


! (i - !)!(« - i)\ 


i=k 


" I 


i=k 


(9.22) 


(9.23) 


* Let 


C(n, m) 


f 


y n (l-y) m dy 


Integration by parts yields C(n, m ) = [m/(n + 1 )]C(n + 1, m — 1). Starting with C(n, 0) = 1 /{n + 1), 
Equation (9.22) follows by mathematical induction. 
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Remark Equation (9.23) could have been proven directly by making use of special 
properties of the exponential distribution. First note that the lifetime of a A-out-of-n 
system can be written as 7) + • • • + T n - k + 1 , where 7) represents the time between 
the (/ — l)st and ith failure. This is true since T\ + ■ ■ ■ + T n - k + 1 equals the time at 
which the (n — k+ l)st component fails, which is also the first time that the number of 
functioning components is less than A. Now, when all n components are functioning, 
the rate at which failures occur is n/0. That is, T\ is exponentially distributed with 
mean 0/n. Similarly, since 7) represents the time until the next failure when there are 
n — (i — 1) functioning components, it follows that 7} is exponentially distributed with 
mean 9/(n — i + 1). Hence, the mean system lifetime equals 


E[T\ + • • ■ + T n _ k+l ] — 0 


1 

n 


1 

k 


Note also that it follows, from the lack of memory of the exponential, that the 7}, i = 
1 — k + 1 , are independent random variables. 


9.6.1 An Upper Bound on the Expected Life of a Parallel System 

Consider a parallel system of n components, whose lifetimes are not necessarily inde¬ 
pendent. The system lifetime can be expressed as 

system life = max Xj 

i 

where Xj is the lifetime of component i,i — 1We can bound the expected 
system lifetime by making use of the following inequality. Namely, for any constant c 

n 

max Xj c + E (x > - c > + ( 9 - 24) 

‘ i =1 

where x + , the positive part of x, is equal to x if x >0 and is equal to 0 if .r ^0. The 
validity of Inequality (9.24) is immediate since if max Xj < c then the left side is equal 
to max Xj and the right side is equal to c. On the other hand, if X( n) = max Xj > c then 
the right side is at least as large as c + (X( n) — c) = X( n) . It follows from Inequality 
(9.24), upon taking expectations, that 

n 

E [maxX;] ^ c + E E[{Xj - c)+] (9.25) 

1 i=i 

Now, (Xj — c)+ is a nonnegative random variable and so 

rOO 

E[(Xj - c)+] = P{(Xj - c)+ > x] dx 

Jo 

oo 

P{Xj — c > x}dx 

OO 



P{Xj > y}dy 
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Thus, we obtain 


n r oc 

£[maxX/] < c + ^ / P\Xj 
1 <= l Jc 


> y)dy 


(9.26) 


Because the preceding is true for all c, it follows that we obtain the best bound by 
letting c equal the value that minimizes the right side of the preceding. To determine 
that value, differentiate the right side of the preceding and set the result equal to 0, to 
obtain 


1 - p i x i > c} = 0 

i=t 

That is, the minimizing value of c is that value c* for which 

n 

J2 P{Xi > c*} = 1 
1=1 

Since p ! X, > c) is a decreasing function of c, the value of c* can be easily 

approximated and then utilized in Inequality (9.26). Also, it is interesting to note that c* 
is such that the expected number of the X ; that exceed c* is equal to 1 (see Exercise 32). 
That the optimal value of c has this property is interesting and somewhat intuitive in 
as much as Inequality (9.24) is an equality when exactly one of the Xj exceeds c. 

Example 9.29 Suppose the lifetime of component i is exponentially distributed with 
rate X,, i = 1Then the minimizing value of c is such that 

n n 

i = Y J p{ x i > c*} = J2 e ~ x,c * 

i=i i=i 

and the resulting bound of the mean system life is 

n 

E[maxXj] ^c* + J2 E i( x i - c*)+] 

1 i =1 

n 

= c* + J2 {EUXi ~ c*)+ | Xj > c*]P{Xj > c *] 

i=i 

+ E[(Xj - c*)+ | Xi < c*] P{Xj ^ c*]) 

= c* + Y' —e~ XiC * 

h Xi 

In the special case where all the rates are equal, say, Xj = X, i — 1,... ,n, then 

1 = ne~ Xc or c* = - log(n) 

X 
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and the bound is 

E\ max A, ] ^ |(log(n) + 1) 
i A 

That is, if X\, ..., X n are identically distributed exponential random variables with 
rate X, then the preceding gives a bound on the expected value of their maximum. In 
the special case where these random variables are also independent, the following exact 
expression, given by Equation (9.25), is not much less than the preceding upper bound: 

1 ” 1 rn i i 

£[ max A)'] = - V^l /i ~ — 1 — dx ss — log (n) ■ 

i X X J i x X 


9.7 Systems with Repair 


Consider an n-component system having reliability function r (p). Suppose that com¬ 
ponent i functions for an exponentially distributed time with rate X, and then fails; 
once failed it takes an exponential time with rate /r, to be repaired, i = 1All 
components act independently. 

Let us suppose that all components are initially working, and let 
A(t) — P {system is working at t } 

A(t ) is called the availability at time t. Since the components act independently, A(t) 
can be expressed in terms of the reliability function as follows: 

A(t) — r(Ai(f), ..., A n (t)) (9.27) 

where 


Aj(t) = Cjcomponent i is functioning at r} 


Now the state of component i —either on or off—changes in accordance with a two-state 
continuous time Markov chain. Hence, from the results of Example 6 .12 we have 


Ai(t) = P 00 (t) = + 

P-i + A.,- 


A/ 

Pi + A; 


g— (A.f+/Zi)f 


Thus, we obtain 


A(t) = r 



+IL)t 

P + A 


If we let t approach oo, then we obtain the limiting availability—call it A —which is 
given by 


A — lim A(t) — r 

l —»oo 


p 


X A P 
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Remarks 


(i) If the on and off distribution for component i are arbitrary continuous distributions 
with respective means 1 Ai and 1 / m , i = l,... ,n, then it follows from the theory 
of alternating renewal processes (see Section 7.5.1) that 


Aj(t) —> 


1 Ai 


Mi 


1A; + 1/M; Mi + A; 


and thus using the continuity of the reliability function, it follows from (9.27) that 
the limiting availability is 


A = lim A(t) = r 

I —>oo 


M 


M T A 


Hence, A depends only on the on and off distributions through their means. 

(ii) It can be shown (using the theory of regenerative processes as presented in Section 
7.5) that A will also equal the long-run proportion of time that the system will be 
functioning. 

Example 9.30 For a series system, r( p) = i Pi and so 

n 

A(t )=n 


i=i 


w + Xi c -(*,-+w)t 


Mi + 7.; Mi + A; 


and 




Mi 


i=l 


Mi + A./ 


Example 9.31 For a parallel system, r(p) = 1 — n7= l (1 — Pi ) anc * thus 
A; 


A(t)=i - n ,, 

L Mi + A,- 


(1 — e 




and 


A(t) = 1 -f 


A,- 


i=l 


Mi + A i 


The preceding system will alternate between periods when it is up and periods when 
it is down. Let us denote by 17, and D,. i ^ 1, the lengths of the ith up and down period 
respectively. For instance in a two-out-of-three system, U\ will be the time until two 
components are down; D \, the additional time until two are up; LM the additional time 
until two are down, and so on. Let 


- ,. U\-\ - YU„ 

U = lim -, 

n->oo n 

^ ,. D\ -\ - + D n 

D = lim - 

n->oo n 
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denote the average length of an up and down period respectively.* 

To determine U and D, note first that in the first n up-down cycles—that is, in time 
E" = 1 ( Ui + Dj) —the system will be up for a time Y1=i A- Hence, the proportion of 
time the system will be up in the first n up-down cycles is 

u\ + ■ ■ ■ + Un = YU u </ n 

A + ■ ■ ■ + U n + Di + ■ ■ ■ + D„ YU Ui/n + YU Di/n 

As n —> oo, this must converge to A, the long-run proportion of time the system is up. 
Hence, 



U + D 


—) 

A + (L J 


(9.28) 


However, to solve for U and D we need a second equation. To obtain one consider the 
rate at which the system fails. As there will be n failures in time Y'U HA + A ), it 
follows that the rate at which the system fails is 


rate at which system fails = lim == 7 ;-= 7 ;- 

y Til u 1 + Ei Di 


Y'l Ui In + El Di/n U + D (9 ' 29) 


That is, the foregoing yields the intuitive result that, on average, there is one failure every 
U+D time units. To utilize this, let us determine the rate at which a failure of component 
i causes the system to go from up to down. Now, the system will go from up to down 
when component i fails if the states of the other components x\, ..., x, _i, x;_i, ..., x n 
are such that x) = 1, 0(0/, x) = 0. That is, the states of the other components 
must be such that 


0(1/, x) - 0(0/, x) = 1 


(9.30) 


Since component i will, on average, have one failure every 1 /A/ + 1 //x, time units, 
it follows that the rate at which component i fails is equal to (1/A/ + 1//U ./) -1 = 
A;/x;/(A/ + fij). In addition, the states of the other components will be such that (9.30) 
holds with probability 


P{0(l,,X(oo))-0(O/,X(oo))= 1} 


= £[ 0 ( 1 ;, X(oo)) - 0 ( 0 /, X(oo))] 


since 0 ( 1 /, X(oo)) — 0 ( 0 /, A'(oo)) 
is a Bernoulli random variable 


= r 1 ;, 


fi 


A + fi 


-Ho,-, 


/i 


A + fi 


Hence, putting the preceding together we see that 


rate at which component 
i causes the system to fail 


+ fli L \ A, + [L ) \ A, + fL ) 


* It can be shown using the theory of regenerative processes that, with probability 1, the preceding limits 
will exist and will be constants. 
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Summing this over all components i thus gives 


rate at which system fails 


E 

i 


hi Pi 
hi + fli 


r 


1 «, 


X -J- [L ) 



—y 

a + fi) _ 


Finally, equating the preceding with (9.29) yields 


1 

U + D 


E 

i 


hi Pi 
hi + Pi 


r 


1 m 


A + //. / 



(9.31) 


Solving (9.28) and (9.31), we obtain 


U = 


H 


A + [L 


T.- 


A im 


A; + Hi 


r li, 


H 


A + h 


- rl 0 , 


H 


A + fi 


(9.32) 


1 -r 

( * N 

)] 

U 

VA + H/ 

'( 

' * ) 


V A + h) 


(9.33) 


Also, (9.31) yields the rate at which the system fails. 

Remark In establishing the formulas for U and /), we did not make use of the 
assumption of exponential on and off times and in fact, our derivation is valid and 
Equations (9.32) and (9.33) hold whenever U and D are well defined (a sufficient 
condition is that all on and off distributions are continuous). The quantities A,, /x,, i = 
1 ,..., n, will represent, respectively, the reciprocals of the mean lifetimes and mean 
repair times. 

Example 9.32 For a series system, 


n, 


Hi 


U = 


Hi + A; 


A iHi Hj 

Li ,. , ..11 j+i~ 


A i + Hi ' H j + A j 

l _ rr. 

D= - ' ^! +ki x 1 


n, 


Hi 


Hi + A i 


E/A/ 


1 

Hi A/ 
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whereas for a parallel system. 


D 


U 



The preceding formulas hold for arbitrary continuous up and down distributions with 
1 /A.; and 1 /yu, denoting respectively the mean up and down times of component i, i = 


1 . n. 


9.7.1 A Series Model with Suspended Animation 

Consider a series consisting of n components, and suppose that whenever a component 
(and thus the system) goes down, repair begins on that component and each of the other 
components enters a state of suspended animation. That is, after the down component is 
repaired, the other components resume operation in exactly the same condition as when 
the failure occurred. If two or more components go down simultaneously, one of them is 
arbitrarily chosen as being the failed component and repair on that component begins; 
the others that went down at the same time are considered to be in a state of suspended 
animation, and they will instantaneously go down when the repair is completed. We 
suppose that (not counting any time in suspended animation) the distribution of time 
that component i functions is h) with mean u,, whereas its repair distribution is G, 
with mean dj, i = 1 

To determine the long-run proportion of time this system is working, we reason as 
follows. To begin, consider the time, call it T, at which the system has been up for 
a time t. Now, when the system is up, the failure times of component i constitute a 
renewal process with mean interarrival time . Therefore, it follows that 


t 


number of failures of i in time T 


As the average repair time of i is di , the preceding implies that 


total repair time of i in time T ~ — 

Uj 


Therefore, in the period of time in which the system has been up for a time t, the total 
system downtime has approximately been 


n 
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Hence, the proportion of time that the system has been up is approximately 
t 


t + tJ2"=\di/ u i 

Because this approximation should become exact as we let t become larger, it follows 
that 


proportion of time the system is up = 


which also shows that 


1 


1 + Hi dj /l 


(9.34) 


proportion of time the system is down = 1 — proportion of time the system is up 

_ Hi dj/uj 

1 + Hi d i/ u < 

Moreover, in the time interval from 0 to T, the proportion of the repair time that has 
been devoted to component i is approximately 

t d[ f 14 i 

Hi tdi/ui 

Thus, in the long run, 

proportion of down time that is due to component i = 


di/u. 


Hi di/ui 

Multiplying the preceding by the proportion of time the system is down gives 

di / Ui 


proportion of time component i is being repaired = 


1 + Hi d ‘/ u i 


Also, since component j will be in suspended animation whenever any of the other 
components is in repair, we see that 


proportion of time component j is in suspended animation = 


Hijij d >/ u i 

1 + Hi d i/ u i 


Another quantity of interest is the long-run rate at which the system fails. Since 
component; fails at rate 1 /«, when the system is up, and does not fail when the system 
is down, it follows that 


rate at which ; fails = 


proportion of time system is up 


\/llj 


1 + Hi d i/ u i 

Since the system fails when any of its components fail, the preceding yields that 

Hi 1 Ae¬ 


rate at which the system fails = 


1 + Hi d i/ U 


(9.35) 
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If we partition the time axis into periods when the system is up and those when it 
is down, we can determine the average length of an up period by noting that if U (?) is 
the total amount of time that the system is up in the interval [0, f], and if N(t) is the 
number of failures by time t, then 


average length of an up period = lim 


U(t) 


f-S-OO N(t) 

= lim - 

f^oo N(t)/t 

1 


£/!/«; 


where the final equality used Equations (9.34) and (9.35). Also, in a similar manner it 
can be shown that 


average length of a down period 


£/ dj/uj 

£, 1/Mi 


(9.36) 


Exercises 

1. Prove that, for any structure function </>, 

0(x) = Xi<j>(li,x) + (1 - Xi)(p(0i,x) 


where 


(l/,x) = (xj-,x;_ i, l,x/+i,...,x„), 

(0/, x) = (xi -- x/_i, 0, x i+ 1,..., x n ) 

2. Show that 

(a) if </>(0, 0, ..., 0) = 0 and 0(1, 1.1) = 1, then 

minx,- ^ 0(x) ^ maxx, 

(b) 0(max(x, y)) > max(</>(x), <p(y)) 

(c) 0(min(x,y)) < min(0(x), 0(y)) 

3. For any structure function, we define the dual structure </> D by 

0 D (x) = 1 -0(l-x) 

(a) Show that the dual of a parallel (series) system is a series (parallel) system. 

(b) Show that the dual of a dual structure is the original structure. 

(c) What is the dual of a A'-out-of-n structure? 

(d) Show that a minimal path (cut) set of the dual system is a minimal cut (path) 
set of the original structure. 
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*4. Write the structure function corresponding to the following: 

(a) See Figure 9.16: 


Figure 9.16 



(b) See Figure 9.17: 



(c) See Figure 9.18: 


Figure 9.18 



4 


5. Find the minimal path and minimal cut sets for: 
(a) See Figure 9.19: 



Figure 9.19 
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(b) See Figure 9.20: 



Figure 9.20 

*6. The minimal path sets are {1, 2, 4}, {1, 3, 5}, and {5, 6}. Give the minimal cut 
sets. 

7. The minimal cut sets are {1, 2, 3}, {2, 3, 4}, and {3, 5}. What are the minimal 
path sets? 

8. Give the minimal path sets and the minimal cut sets for the structure given by 
Figure 9.21 

9. Component i is said to be relevant to the system if for some state vector x, 

0(l;,x)=l, 0(0/, x) = 0 

Otherwise, it is said to be irrelevant. 


5 



Figure 9.21 

(a) Explain in words what it means for a component to be irrelevant. 

(b) Let Ai,, A s be the minimal path sets of a system, and let S denote the 
set of components. Show that S = U?=i if and only if all components are 
relevant. 

(c) Let Ci,..., Ck denote the minimal cut sets. Show that S — (J* =1 C/ if an( i 
only if all components are relevant. 

10. Let ti denote the time of failure of the zth component; let r^it) denote the time 
to failure of the system 0 as a function of the vector t = (t\,..., t n ). Show that 

max min t ,• = r,*(t) = min max f,- 

1 ^j^sieAj l^j^kieCj 

where C\,... ,Ck are the minimal cut sets, and A\,..., A s the minimal path 
sets. 

11. Give the reliability function of the structure of Exercise 8. 
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*12. Give the minimal path sets and the reliability function for the structure in 

Figure 9.22. 


1 4 


2 


3 5 


Figure 9.22 


13. Let r(p) be the reliability function. Show that 


/-(p) = pirili, p) + (1 - parity, p) 


14. Compute the reliability function of the bridge system (see Figure 9.1 1) by con¬ 
ditioning upon whether or not component 3 is working. 

15. Compute upper and lower bounds of the reliability function (using Method 2) for 
the systems given in Exercise 4, and compare them with the exact values when 



16. Compute the upper and lower bounds of r(p) using both methods for the 

(a) two-out-of-three system and 

(b) two-out-of-four system. 

(c) Compare these bounds with the exact reliability when 

(i) Pi = 0.5 

(ii) Pi =0.8 

(iii) pi = 0.2 

*17. Let N be a nonnegative, integer-valued random variable. Show that 


P{N > 0} ^ 


(E[N]) 2 

P[N 2 ] 


and explain how this inequality can be used to derive additional bounds on a 
reliability function. 

Hint: 


LIW 2 ] = E[N 2 | N > 0]P{N > 0} (Why?) 

> (E[N | N > 0]) 2 P{?V > 0} (Why?) 


Now multiply both sides by P{N > 0}. 

18. Consider a structure in which the minimal path sets are {1, 2, 3} and {3, 4, 5}. 
(a) What are the minimal cut sets? 
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(b) If the component lifetimes are independent uniform (0, 1) random variables, 
determine the probability that the system life will be less than j. 

19. Let X \, AL ,.... X n denote independent and identically distributed random vari¬ 
ables and define the order statistics X(i),..., X(„) by 

X(i ) = ;th smallest of X\,X n 

Show that if the distribution of X j is IFR, then so is the distribution of Xgy 
Hint: Relate this to one of the examples of this chapter. 

20. Let F be a continuous distribution function. For some positive a, define the 
distribution function G by 

G(t) = (F (t)) a 

Find the relationship between Xc(t) and Xp(t), the respective failure rate func¬ 
tions of G and F. 

21. Consider the following four structures: 

(i) See Figure 9.23: 


1 2 3 


Figure 9.23 

(ii) See Figure 9.24: 


1 


2 


3 


Figure 9.24 

(iii) See Figure 9.25: 



Figure 9.25 
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(iv) See Figure 9.26: 


1 2 



1 3 



2 3 



Figure 9.26 

Let F i , F%, and F3 be the corresponding component failure distributions; each 
of which is assumed to be IFR (increasing failure rate). Let F be the system failure 
distribution. All components are independent. 

(a) For which structures is F necessarily IFR if F\ = Fj = F 3 ? Give reasons. 

(b) For which structures is F necessarily IFR if Fj = F 3 I Give reasons. 

(c) For which structures is F necessarily IFR if F\ ^ Fi ^ F 3 ? Give reasons. 

*22. Let X denote the lifetime of an item. Suppose the item has reached the age of t. 

Let X t denote its remaining life and define 

F t (a) = P{X, > a} 

In words, F t (a) is the probability that a f-year-old item survives an additional 
time a. Show that 

(a) F t (a) = F(t + a)/F(t) where F is the distribution function of X. 

(b) Another definition of IFR is to say that F is IFR if F, (a) decreases in t, for 
all a. Show that this definition is equivalent to the one given in the text when 
F has a density. 

23. Show that if each (independent) component of a series system has an IFR distri¬ 
bution, then the system lifetime is itself IFR by 

(a) showing that 

X F (t) = J2 Xi ^ 

i 

where Xp(t) is the failure rate function of the system; and a,- (f ) the failure 
rate function of the lifetime of component i. 

(b) using the definition of IFR given in Exercise 22. 

24. Show that if F is IFR, then it is also IFRA, and show by counterexample that the 
reverse is not true. 

*25. We say that f is a / 7 -percentile of the distribution F if F(X) = p. Show that if £ 
is a / 7 -percentile of the IFRA distribution F, then 

F{x) < e~ 9x , x ^ f 
F(x) > e~ 8x , x < f 
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26. 

27. 


28. 

29. 


*30. 

31. 


32. 

33. 

34. 

35. 


where 

g = ~log(l ~ p) 

C 

Prove Lemma 9.3. 

Hint: Letx = y+<5. Note that fit) = t a is a concave function when 0 ^ a ^ 1, 
and use the fact that for a concave function fit + h) — /(f) is decreasing in t. 
Let r{p) = rip , p,..., p). Show that if ripo) = po, then 


rip) Z p for p Z po 
rip) Z p for p Z Po 


Hint: Use Proposition 9.2. 

Find the mean lifetime of a series system of two components when the component 
lifetimes are respectively uniform on (0, 1) and uniform on (0, 2). Repeat for a 
parallel system. 

Show that the mean lifetime of a parallel system of two components is 

1 Ml M2 

Ml + M2 (M1+M2>M2 (mi+M2)mi 

when the first component is exponentially distributed with mean l/p-i and the 
second is exponential with mean I// 22 . 

Compute the expected system lifetime of a three-out-of-four system when the first 
two component lifetimes are uniform on (0, 1) and the second two are uniform 
on (0, 2). 

Show that the variance of the lifetime of a k-out-of-n system of components, 
each of whose lifetimes is exponential with mean 6 , is given by 


In Section 9.6.1 show that the expected number of X, that exceed c* is equal 
to 1. 

Let Xj be an exponential random variable with mean 8 + 2/ for i = 1, 2, 3. Use 
the results of Section 9.6.1 to obtain an upper bound on E[maxX,-], and then 
compare this with the exact result when the Xj are independent. 

For the model of Section 9.7, compute for a k-out-of-n structure (i) the average 
up time, (ii) the average down time, and (iii) the system failure rate. 

Prove the combinatorial identity 



(a) by induction on i 

(b) by a backwards induction argument on i —that is, prove it first for i = n, 
then assume it for i = k and show that this implies that it is true for i = k — 1. 


36. Verify Equation (9.36). 
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Brownian Motion and 
Stationary Processes 



10.1 Brownian Motion 


Let us start by considering the symmetric random walk, which in each time unit is 
equally likely to take a unit step either to the left or to the right. That is, it is a Markov 
chain with Pij+i = \ = Pi,i—i, i = 0, ±1,... . Now suppose that we speed up this 
process by taking smaller and smaller steps in smaller and smaller time intervals. If we 
now go to the limit in the right manner what we obtain is Brownian motion. 

More precisely, suppose that each At time unit we take a step of size Ax either to 
the left or the right with equal probabilities. If we let X(t) denote the position at time 
1 then 


X(t) = Ax(X t + ■ ■ ■ + X ltm ) 


( 10 . 1 ) 


where 


1 + 1, if the ith step of length Ax is to the right 
— 1, if it is to the left 


[t/At] is the largest integer less than or equal to t/At, and the X, are assumed inde¬ 
pendent with 

P{Xi = \} = P{Xi=-\}=\ 

As E[Xj] = 0, Var(X,) = E[Xr] = 1, we see from Equation (10.1) that 


E[X(t)] = 0, 
Var(X(f)) = (Ax) 2 


t 

At 


( 10 . 2 ) 


Introduction to Probability Models, Eleventh Edition. http://dx.doi.org/10.1016/B978-0-12-407948-9.00010-4 
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We shall now let Ax and At go to 0. However, we must do it in a way such that the 
resulting limiting process is nontrivial (for instance, if we let Ax = At and let At —> 0, 
then from the preceding we see that E[X{t)\ and Var( Xit)) would both converge to 
0 and thus X(t) would equal 0 with probability 1). If we let Ax = crVAt for some 
positive constant er then from Equation (10.2) we see that as At —>■ 0 

E[X(t)] = 0, 

Var(X(t))—>er 2 t 

We now list some intuitive properties of this limiting process obtained by taking 
Ax = aVAr and then letting At —* 0. From Equation (10.1) and the central limit 
theorem the following seems reasonable: 

(i) X(t) is normal with mean 0 and variance a 2 t. In addition, because the changes 
of value of the random walk in nonoverlapping time intervals are independent, 

(ii) {X{t),t Js 0} has independent increments, in that for all t\ < t 2 < • • • < t n 

X(t n ) - X(t n -\), X(t n -i ) - X(f„_ 2 ),..., X(t 2 ) - X(h), Z(tj) 

are independent. Finally, because the distribution of the change in position of the 
random walk over any time interval depends only on the length of that interval, it 
would appear that 

(hi) .■is 0} has stationary increments, in that the distribution of X(t -F s') X(t) 

does not depend on t. We are now ready for the following formal definition. 

Definition 10.1 A stochastic process {X(t), r ^ 0} is said to be a Brownian motion 
process if 

(i) X(0) = 0; 

(ii) {X(t), t Js 0} has stationary and independent increments; 

(iii) for every t > 0, X(t ) is normally distributed with mean 0 and variance o 2 t. 

The Brownian motion process, sometimes called the Wiener process, is one of the 
most useful stochastic processes in applied probability theory. It originated in physics as 
a description of Brownian motion. This phenomenon, named after the English botanist 
Robert Brown who discovered it, is the motion exhibited by a small particle that is 
totally immersed in a liquid or gas. Since then, the process has been used beneficially 
in such areas as statistical testing of goodness of fit, analyzing the price levels on the 
stock market, and quantum mechanics. 

The first explanation of the phenomenon of Brownian motion was given by Einstein 
in 1905. He showed that Brownian motion could be explained by assuming that the 
immersed particle was continually being subjected to bombardment by the molecules 
of the surrounding medium. However, the preceding concise definition of this stochas¬ 
tic process underlying Brownian motion was given by Wiener in a series of papers 
originating in 1918. 

When a = 1, the process is called standard Brownian motion. Because any 
Brownian motion can be converted to the standard process by letting B(t) = X(t)/o 
we shall, unless otherwise stated, suppose throughout this chapter that a = 1. 
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The interpretation of Brownian motion as the limit of the random walks (Equation 
(10.1)) suggests that X(t) should be a continuous function of t, which turns out to be 
true. To prove this, we must show that with probability 1 


lim (X[t + h) — X(t)) = 0 
h^O 


Although a rigorous proof of the preceding is beyond the scope of this text, a plausibility 
argument is obtained by noting that the random variable X(t + h) — X(t) has mean 0 
and variance h, and so would seem to converge to a random variable with mean 0 and 
variance 0 as h —> 0. That is, it seems reasonable that X(t + h) — X( 1 ) converges to 
0 , thus yielding continuity. 

Although X(t) will, with probability 1, be a continuous function of t, it possesses 
the interesting property of being nowhere differentiable. To see why this might be 
the case, note that has mean 0 and variance 1 /h. Because the variance 

of X{t+, ’I~ X{I '> converges to oo as h —> 0 , it is not surprising that the ratio does not 
converge. 

As X(t) is normal with mean 0 and variance t, its density function is given by 


Mx) = 


1 

y/2nt 


To obtain the joint density function of X(t\), X (t 2 ), ..., X(t„) for t\ < ■ ■ ■ < t n , note 
first that the set of equalities 


X(ti) = xi, 
X(t 2 ) = x 2 , 


X (hi ) — X n 

is equivalent to 


X(h) =x u 

X(t 2 ) - x(t\) =x 2 - XI, 


X{tn) X(t n — ]) — X n X n — { 

However, by the independent increment assumption it follows that X(t\), X(t 2 ) — 
X (t \),..., X (t n ) — X (f„_ i), are independent and, by the stationary increment assump¬ 
tion, that X ( 4 ) — X( 4 _i) is normal with mean 0 and variance 4 — 4 _i. Hence, the 
joint density of X(4 ),..., X(t„) is given by 


f(xi,x 2 , ■ ■ ■, x n ) = f tl (xi)f t2 - tl (x 2 - x\) ■ ■ ■ f,(x n - X n -l) 


exp 


1 

2 



(X 2 -xQ 2 (x„ - Xn-l ) 2 

4 4 t n 4—1 


(27r)"/ 2 [4 (4 — 4) • • • (4 _ 4-i)]i / 2 


(10.3) 
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From this equation, we can compute in principle any desired probabilities. For instance, 
suppose we require the conditional distribution of X(s ) given that X(t) = B where 
s < t. The conditional density is 


fs\t(x\B) 


_ fs(x)fi-s(B - x) 

MB) 

= K\ exp{— x 1 /2s — (B — x) 2 /2(t — s)} 


= K 2 exp 
= K 2 exp 
= K 2 exp 


— 


1 


1 


2 s 2(t — s) 


Bx 


2 s(t — s) 

{x — Bs/t ) 2 
2 s(t — s)/t 


A 9 SB 

x — 2 —x 


where K \, K 2 , and K 3 do not depend on x. Hence, we see from the preceding that the 
conditional distribution of X(s ) given that X (t ) = B is, for s < t, normal with mean 
and variance given by 


£[X(y)|X(0 = B] = s -b, 

Vai[X(s)\X(t) = B] = -(t - s) (10.4) 


Example 10.1 In a bicycle race between two competitors, let Y (t) denote the amount 
of time (in seconds) by which the racer that started in the inside position is ahead when 
lOOf percent of the race has been completed, and suppose that \ Y(t), 0 X t X 1} can 
be effectively modeled as a Brownian motion process with variance parameter a 2 . 

(a) If the inside racer is leading by a seconds at the midpoint of the race, what is the 
probability that she is the winner? 

(b) If the inside racer wins the race by a margin of a seconds, what is the probability 
that she was ahead at the midpoint? 

Solution: 

(a) F{F(1) > 0|F(l/2) = a} 

= P{Y{ 1) - Y (1/2) > —a|F(l/2) = a} 

= P\Y(\) — Y( 1/2) > —a} by independent increments 

= P{Y( 1/2) > — er} by stationary increments 

F(l/2) r 
= P ' ’ > -V2 

o/y/2 

= <D(>/2) 

« 0.9213 


where <1) (x) = P{N(0, 1) ^ x) is the standard normal distribution function. 
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(b) Because we must compute P{Y( 1/2) > 0|T(1) = a}, let us first deter¬ 
mine the conditional distribution of Y (s) given that Y (t) = C, when 
s < t. Now, since (X(t), t ^ 0} is standard Brownian motion when 
X(t) = Y(t)/a, we obtain from Equation (10.4) that the conditional 
distribution of X(s), given that X(t) = C/a, is normal with mean 
sC/ta and variance s(t — s)/t. Hence, the conditional distribution of 
Y(s) = aX(s ) given that Y(t) = C is normal with mean sC/t and 
variance a 2 s(t — s)/t. Hence, 


P{Y( 1/2) > 0|T(1) = a) = P{N(a/ 2, a 2 / 4) > 0} 


= ®( 1 ) 

« 0.8413 


10.2 Hitting Times, Maximum Variable, and the 
Gambler's Ruin Problem 

Let T a denote the first time the Brownian motion process hits a. When a > 0 we will 
compute P{T a ^ t] by considering P{X(t) ^ a] and conditioning on whether or not 
T a ^ t. This gives 

P{X(t) >a}= P{X(t) > a\T a < t}P{T a t} 


+ P{X(t)^a\T a > t}P{T a >t} 


(10.5) 


Now if T a t, then the process hits a at some point in [0, f] and, by symmetry, it is 
just as likely to be above a or below a at time t. That is, 

P{X{t)>a\T a ^t} = \ 

As the second right-hand term of Equation (10.5) is clearly equal to 0 (since, by 
continuity, the process value cannot be greater than a without having yet hit a), we see 
that 


P{T a ^t} = 2 P{X(t) > a} 



( 10 . 6 ) 


For a < 0, the distribution of T a is, by symmetry, the same as that of T- a . Hence, 
from Equation (10.6) we obtain 



(10.7) 
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Another random variable of interest is the maximum value the process attains in 
[0, r]. Its distribution is obtained as follows: For a > 0 


max X(,y) ^ a 

0<s<t 


= P{T a < t] by continuity 
= 2 P{X(t) ^ a} from (10.6) 


yPht Ja/^/t 


r 

Ja 


„-y 2 /2 


dy 


Let us now consider the probability that Brownian motion hits A before —B where 
A > 0, B > 0. To compute this we shall make use of the interpretation of Brownian 
motion as being a limit of the symmetric random walk. To start let us recall from the 
results of the gambler’s ruin problem (see Section 4.5.1) that the probability that the 
symmetric random walk goes up A before going down B when each step is equally likely 
to be either up or down a distance Ax is (by Equation (4.14) with N = (A+B)/Ax, i = 
B/ Ax) equal to BAx/(A + B) Ax = B/(A + B). 

Hence, upon letting Ax —> 0, we see that 

B 

Plup A before down B 1 = - 

V A + B 


10.3 Variations on Brownian Motion 

10.3.1 Brownian Motion with Drift 

We say that t + 0} is a Brownian motion process with drift coefficient n and 

variance parameter a 2 if 

(i) X(0) = 0; 

(ii) {2f(f), t '+ 0} has stationary and independent increments; 

(iii) X(t) is normally distributed with mean fit and variance to 2 . 

An equivalent definition is to let {B(t), t ^ 0} be standard Brownian motion and 
then define 

X(t) = oB(t) + fit 

It follows from this representation that X(t) will also be a continuous function of t. 

10.3.2 Geometric Brownian Motion 

if{im t ^ 0} is a Brownian motion process with drift coefficient fi and variance 
parameter o 2 , then the process [X(t), t + 0) defined by 

X(t) = e Y(,) 


is called geometric Brownian motion. 
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For a geometric Brownian motion process {X(t)\, let us compute the expected value 
of the process at time t given the history of the process up to time s. That is, for s < t, 
consider E[X(t)\X(u), 0 ^ u ^ s], Now, 

E[X{t)\X{u), 0 < u < s] = E[e Y{t) \Y(u), 0 ^ u < s] 

= E[e m+Y(t) - ns) \Y(u), 0 < u < s] 

= e Y{s) E[e Y(,) ~ Y{s) \Y(u), 0 < u < s] 

= X(s)£[e y(0_y(s) ] 

where the next to last equality follows from the fact that Y (s) is given, and the last equal¬ 
ity from the independent increment property of Brownian motion. Now, the moment 
generating function of a normal random variable W is given by 

E\e aW | = e fl£[W]+a 2 Var(W)/2 

Hence, since Y(t) — Y(s) is normal with mean /iff — s ) and variance (f — s)c r 2 , it 
follows by setting a = 1 that 

E[e Y{t) ~ Y(s) ] = ef i(t-s)+(t-s)a 2 /2 

Thus, we obtain 

E[X(t)\X(u), 0^u^s] = X(s)e u ~ sH,J - +rT2/2) (10.8) 

Geometric Brownian motion is useful in the modeling of stock prices over time 
when you feel that the percentage changes are independent and identically distributed. 
For instance, suppose that X n is the price of some stock at time n. Then, it might be rea¬ 
sonable to suppose that X„/X„-i, n ^ I, are independent and identically distributed. 
Let 

Y n = X n /X n _ { 
and so 

X n = Y n X n —, 

Iterating this equality gives 
Xn = Y n Y n -iX n - 2 
= Y n Y n - X Y n - 2 X n -3 

= Y n Y n -i---YiX 0 

Thus, 

n 

log(X„) = lo S ( F <) + lo 8 ( x o) 
i—\ 

Since logfK,), i Yf 1 are independent and identically distributed, {log ( X n )} will, when 
suitably normalized, approximately be Brownian motion with a drift, and so {X,,} will 
be approximately geometric Brownian motion. 
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10.4 Pricing Stock Options 

10.4.1 An Example in Options Pricing 

In situations in which money is to be received or paid out in differing time periods, we 
must take into account the time value of money. That is, to be given the amount v a 
time t in the future is not worth as much as being given v immediately. The reason for 
this is that if we were immediately given v, then it could be loaned out with interest 
and so be worth more than v at time t. To take this into account, we will suppose that 
the time 0 value, also called the present value, of the amount v to be earned at time 
t is ve~ al . The quantity a is often called the discount factor. In economic terms, the 
assumption of the discount function e~ at is equivalent to the assumption that we can 
earn interest at a continuously compounded rate of 100a percent per unit time. 

We will now consider a simple model for pricing an option to purchase a stock at a 
future time at a fixed price. 

Suppose the present price of a stock is $100 per unit share, and suppose we know 
that after one time period it will be, in present value dollars, either $200 or $50 (see 
Figure 10.1). It should be noted that the prices at time 1 are the present value (or time 0) 
prices. That is, if the discount factor is a, then the actual possible prices at time 1 are 
either 200c" or 50e“. To keep the notation simple, we will suppose that all prices given 
are time 0 prices. 

Suppose that for any y, at a cost of cy, you can purchase at time 0 the option to buy y 
shares of the stock at time 1 at a (time 0) cost of $150 per share. Thus, for instance, if 
you do purchase this option and the stock rises to $200, then you would exercise the 
option at time 1 and realize a gain of $200 — 150 = $50 for each of the y option units 
purchased. On the other hand, if the price at time 1 was $50, then the option would be 
worthless at time 1. In addition, at a cost of lOO.r you can purchase x units of the stock 
at time 0, and this will be worth either 200x or 5Ox at time 1. 

We will suppose that both x or y can be either positive or negative (or zero). That is, 
you can either buy or sell both the stock and the option. For instance, if x were negative 
then you would be selling — x shares of the stock, yielding you a return of — lOOx, and 
you would then be responsible for buying — x shares of the stock at time 1 at a cost of 
either $200 or $50 per share. 

We are interested in determining the appropriate value of c, the unit cost of an 
option. Specifically, we will show that unless c = 50/3 there will be a combination of 
purchases that will always result in a positive gain. 


100 



200 


50 


time 1 price 


Figure 10.1 


time 0 price 
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To show this, suppose that at time 0 we 

buy x units of stock, and 
buy y units of options 

where x and y (which can be either positive or negative) are to be determined. The 
value of our holding at time 1 depends on the price of the stock at that time; and it is 
given by the following 

. 200x + 50v, if price is 200 

Value = (sO.v, if price is 50 

The preceding formula follows by noting that if the price is 200 then the x units of the 
stock are worth 200x, and the y units of the option to buy the stock at a unit price of 
150 are worth (200 — 150)y. On the other hand, if the stock price is 50, then the x units 
are worth 5 Ox and the y units of the option are worthless. Now, suppose we choose y 
to be such that the preceding value is the same no matter what the price at time 1. That 
is, we choose y so that 

200x + 50y = 50.x 


or 

y = —3x 

(Note that y has the opposite sign of x, and so if x is positive and as a result x units of 
the stock are purchased at time 0, then 3x units of stock options are also sold at that 
time. Similarly, if x is negative, then —x units of stock are sold and — 3x units of stock 
options are purchased at time 0 .) 

Thus, with y = —3x, the value of our holding at time 1 is 

value = 5 Ox 

Since the original cost of purchasing x units of the stock and — 3x units of options is 

original cost = lOOx — 3xc, 

we see that our gain on the transaction is 

gain = 50x — (lOOx — 3xc) = x(3c — 50) 

Thus, if 3c = 50, then the gain is 0; on the other hand if 3c ^ 50, we can guarantee a 
positive gain (no matter what the price of the stock at time 1 ) by letting x be positive 
when 3c > 50 and letting it be negative when 3c < 50. 

For instance, if the unit cost per option is c = 20, then purchasing 1 unit of the stock 
(x = 1) and simultaneously selling 3 units of the option (y = —3) initially costs us 
100 — 60 = 40. However, the value of this holding at time 1 is 50 whether the stock 
goes up to 200 or down to 50. Thus, a guaranteed profit of 10 is attained. Similarly, 
if the unit cost per option is c = 15, then selling 1 unit of the stock (x = —1) and 
buying 3 units of the option (y = 3) leads to an initial gain of 100 — 45 = 55. On the 
other hand, the value of this holding at time 1 is —50. Thus, a guaranteed profit of 5 is 
attained. 

A sure win betting scheme is called an arbitrage. Thus, as we have just seen, the 
only option cost c that does not result in an arbitrage is c = 50/3. 
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10.4.2 The Arbitrage Theorem 

Consider an experiment whose set of possible outcomes is S = {1,2, ..., m}. Suppose 
that n wagers are available. If the amount x is bet on wager i, then the return xr, ( j) is 
earned if the outcome of the experiment is j. In other words, r,- (•) is the return function 
for a unit bet on wager i. The amount bet on a wager is allowed to be either positive or 
negative or zero. 

A betting scheme is a vector x = (xi,..., x„) with the interpretation that xi is bet 
on wager 1, X 2 on wager 2,, and x„ on wager n. If the outcome of the experiment 
is j , then the return from the betting scheme x is 

n 

return from x = E xin(j) 

i=i 

The following theorem states that either there exists a probability vector p = ,..., 

p m ) on the set of possible outcomes of the experiment under which each of the wagers 
has expected return 0, or else there is a betting scheme that guarantees a positive win. 

Theorem 10.1 (The Arbitrage Theorem) Exactly one of the following is true: Either 

(i) there exists a probability vector p = (p \,..., p m ) for which 

m 

Y Pjrj(j) — 0, for all i = 1,..., n 
j =i 
or 

(ii) there exists a betting scheme x = (x i ,..., x„ ) for which 

n 

Y, x i''i O') >0, for all j = 1,..., m 

i =1 

In other words, if X is the outcome of the experiment, then the arbitrage theorem 
states that either there is a probability vector p for X such that 

Ep[rj (A)] = 0, for all; = !,...,« 

or else there is a betting scheme that leads to a sure win. 

Remark This theorem is a consequence of the (linear algebra) theorem of the sep¬ 
arating hyperplane, which is often used as a mechanism to prove the duality theorem 
of linear programming. 

The theory of linear programming can be used to determine a betting strategy that 
guarantees the greatest return. Suppose that the absolute value of the amount bet on 
each wager must be less than or equal to 1. To determine the vector x that yields the 
greatest guaranteed win—call this win i>—we need to choose x and v so as to maximize 
v, subject to the constraints 

n 

Y, x ‘ r i O') ^ v ’ for j = I...., m 
;'=1 

— 1 ^ x; ^ 1, i = 1, • • •,n 
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This optimization problem is a linear program and can be solved by standard techniques 
(such as by using the simplex algorithm). The arbitrage theorem yields that the optimal 
v will be positive unless there is a probability vector p for which YH ]'=t Pj r iU) = 0 
for all i = l,... ,n. 

Example 10.2 In some situations, the only types of wagers allowed are to choose 
one of the outcomes i, i = l,... ,m, and bet that i is the outcome of the experiment. 
The return from such a bet is often quoted in terms of “odds.” If the odds for outcome 
i are o, (often written as “o, to 1 ”) then a 1 -unit bet will return o, if the outcome of the 
experiment is i and will return — 1 otherwise. That is, 

r .( /) = R if j = 1 

J — 1 otherwise 

Suppose the odds o, o m are posted. In order for there not to be a sure win there 
must be a probability vector p = (pi,..., p m ) such that 

0 = E p [r, (Z)] = Ojpi - (1 - pi) 


That is, we must have 
1 


Since the /?, must sum to 1, this means that the condition for there not to be an arbitrage 
is that 


m 

^(l + o,.)- 1 = 1 

1=1 

Thus, if the posted odds are such that (1 + o, ) -1 ^ 1, then a sure win is possible. 
For instance, suppose there are three possible outcomes and the odds are as follows: 

Outcome Odds 
1 1 

2 2 

3 3 

That is, the odds for outcome 1 are 1 — 1, the odds for outcome 2 are 2—1, and that 
for outcome 3 are 3 — 1. Since 


a sure win is possible. One possibility is to bet — 1 on outcome 1 (and so you either 
win 1 if the outcome is not 1 and lose 1 if the outcome is 1) and bet —0.7 on out¬ 
come 2, and —0.5 on outcome 3. If the experiment results in outcome 1, then we 
win — 1 + 0.7 + 0.5 = 0.2; if it results in outcome 2, then we win 1 — 1.4 + 0.5 = 0.1; 
if it results in outcome 3, then we win 1 + 0.7 — 1.5 = 0.2. Hence, in all cases we win 
a positive amount. ■ 
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Remark If (1 + o,) 1 ^ I , then the betting scheme 


Xj 


0 +Oi)-' 

1 -E/d + Oi)- 1 ’ 


i = 1, ..., n 


will always yield a gain of exactly 1. 

Example 10.3 Let us reconsider the option pricing example of the previous section, 
where the initial price of a stock is 100 and the present value of the price at time 1 is 
either 200 or 50. At a cost of c per share we can purchase at time 0 the option to buy 
the stock at time 1 at a present value price of 150 per share. The problem is to set the 
value of c so that no sure win is possible. 

In the context of this section, the outcome of the experiment is the value of the stock 
at time 1. Thus, there are two possible outcomes. There are also two different wagers: 
to buy (or sell) the stock, and to buy (or sell) the option. By the arbitrage theorem, there 
will be no sure win if there is a probability vector (p, 1 — p) that makes the expected 
return under both wagers equal to 0. 

Now, the return from purchasing 1 unit of the stock is 


200 — 100 = 100, if the price is 200 at time 1 

return = < 

50 - 100 = -50, if the price is 50 at time 1 
Hence, if p is the probability that the price is 200 at time 1, then 


£[return] = lOOp — 50(1 — p) 


Setting this equal to 0 yields 


P = 


3 


That is, the only probability vector (/?, 1 — p) for which wager 1 yields an expected 
return 0 is the vector (^, |). 

Now, the return from purchasing one share of the option is 


return = 


50 — c, 
—c, 


if price is 200 
if price is 50 


Hence, the expected return when p = i is 


£ [return] = (50 — c)| — c| 

- 50 

— 3 c 


Thus, it follows from the arbitrage theorem that the only value of c for which there will 
not be a sure win is c = y, which verifies the result of section 10.4.1. ■ 
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10.4.3 The Black-Scholes Option Pricing Formula 

Suppose the present price of a stock is A" (0) = xq, and let X ( 1 ) denote its price at 
time t. Suppose we are interested in the stock over the time interval 0 to T. Assume 
that the discount factor is a (equivalently, the interest rate is 100a percent compounded 
continuously), and so the present value of the stock price at time t is e~ al X(t). 

We can regard the evolution of the price of the stock over time as our experiment, 
and thus the outcome of the experiment is the value of the function X(t), 0 ^ t ^ T. 
The types of wagers available are that for any s < t we can observe the process for a 
time s and then buy (or sell) shares of the stock at price X(s) and then sell (or buy) 
these shares at time t for the price X(t). In addition, we will suppose that we may 
purchase any of N different options at time 0. Option i, costing c, per share, gives us 
the option of purchasing shares of the stock at time t, for the fixed price of K, per share, 
i = l,..., N. 

Suppose we want to determine values of the c ( - for which there is no betting strategy 
that leads to a sure win. Assuming that the arbitrage theorem can be generalized (to 
handle the preceding situation, where the outcome of the experiment is a function), it 
follows that there will be no sure win if and only if there exists a probability measure 
over the set of outcomes under which all of the wagers have expected return 0. Let P 
be a probability measure on the set of outcomes. Consider first the wager of observing 
the stock for a time s and then purchasing (or selling) one share with the intention of 
selling (or purchasing) it at time t,0 ^ s < t ^ T. The present value of the amount 
paid for the stock is e~ as X(s), whereas the present value of the amount received is 

X(t). Hence, in order for the expected return of this wager to be 0 when P is the 
probability measure on X(t), 0 ^ t ^ T, we must have 

E P [e~ a ‘X(t)\X(u), 0^u^s] = e~ as X(s) (10.9) 


Consider now the wager of purchasing an option. Suppose the option gives us the right 
to buy one share of the stock at time t for a price K. At time t, the worth of this option 
will be as follows: 


worth of option at time t 


X(t) — K, if X(t)^K 
0, if X{t) < K 


That is, the time t worth of the option is ( X(t ) — K ) + . Hence, the present value of the 
worth of the option is e~ at (X(t) — K ) + . If c is the (time 0) cost of the option, we see 
that, in order for purchasing the option to have expected (present value) return 0, we 
must have 


E P [e~ a, (X(t)- K)+] = c 


( 10 . 10 ) 


By the arbitrage theorem, if we can find a probability measure P on the set of outcomes 
that satisfies Equation (10.9), then if c, the cost of an option to purchase one share at 
time t at the fixed price K, is as given in Equation (10.10), then no arbitrage is possible. 
On the other hand, if for given prices c;, i = 1,..., N , there is no probability measure 
P that satisfies both (10.9) and the equality 

a = E v [e- at ‘(X(ti) - Ki)+], 


i =1,..., N 
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then a sure win is possible. 

We will now present a probability measure P on the outcome X(t), 0 ^ t ^ T, that 
satisfies Equation (10.9). 

Suppose that 


X(t) = x 0 e Y(t) 


where (Y(t), t ^ 0} is a Brownian motion process with drift coefficient /x and variance 
parameter a 2 . That is, [X(t), t ^ 0} is a geometric Brownian motion process (see 
Section 10.3.2). From Equation (10.8) we have that, for s < t, 


E[X(t)\X(u), 0^u^s] = X(s)e (, - sXll+a2/2) 


Hence, if we choose /x and a 2 so that 
/jl + a 1 12 — a 

then Equation (10.9) will be satisfied. That is, by letting P be the probability measure 
governing the stochastic process <f) , 0 ^ t ^ 7’), where \ Y(t)} is Brownian 
motion with drift parameter /x and variance parameter a 2 , and where /i + a 2 12 = a , 
Equation (10.9) is satisfied. 

It follows from the preceding that if we price an option to purchase a share of the 
stock at time t for a fixed price K by 


c = E P [e~ a, (X(t)- K)+] 


then no arbitrage is possible. Since X(t) = xoe Yir> , where Y (?) is normal with mean 


\xt and variance ter 2 , we see that 



Making the change of variable w = (y — fit)/(at */ 2 ) yields 


ce 





where 


a = 


log(K/xo) - [it 
a~Jt 
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Now, 



= e ta1 / 2 P{N(as/t, 1 a] 

= e ,a2/2 P{N( 0 , 1 ) ^a-aVi] 

= e ,a2/2 P{N (0, 1 ) < -{a - aVt)} 
= e ,a l 2 <p(oJl — a) 


— e 


= e 


where N(m. v) is a normal random variable with mean m and variance v, and <(> is the 
standard normal distribution function. 

Thus, we see from Equation (10.1 1) that 

ce at = xoe /if+cr f / 2 $cr\/f — a) — K<p(—a) 

Using that 

p, + it 2 /2 = a 

and letting b = — a , we can write this as follows: 


= xa<t> (o\ft + b) — Ke a, <j>{b) 


( 10 . 12 ) 


c = 


where 


at — o 2 t/2 — log(A'/xo) 

h = - 7 =- 

erv t 

The option price formula given by Equation (10.12) depends on the initial price 
of the stock xo, the option exercise time t, the option exercise price K, the discount 
(or interest rate) factor a, and the value a 2 . Note that for any value of a 2 , if the 
options are priced according to the formula of Equation (10.12) then no arbitrage is 
possible. However, as many people believe that the price of a stock actually follows a 
geometric Brownian motion—that is, X(t) = X()e Ylf) where Y(t) is Brownian motion 
with parameters /x and a 2 —it has been suggested that it is natural to price the option 
according to the formula of Equation (10.12) with the parameter cr 2 taken equal to 
the estimated value (see the remark that follows) of the variance parameter under the 
assumption of a geometric Brownian motion model. When this is done, the formula of 
Equation (10.12) is known as the Black-Scholes option cost valuation. It is interesting 
that this valuation does not depend on the value of the drift parameter // but only on 
the variance parameter o 2 . 

If the option itself can be traded, then the formula of Equation (10.12) can be used 
to set its price in such a way so that no arbitrage is possible. If at time s the price of 
the stock is X (.v) = x v , then the price of a (f, K ) option—that is, an option to purchase 
one unit of the stock at time t for a price K —should be set by replacing t by t — s and 
xo by x s in Equation (10.12). 
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Remark If we observe a Brownian motion process with variance parameter a 2 over 
any time interval, then we could theoretically obtain an arbitrarily precise estimate of 
a 2 . For suppose we observe such a process { Y (,v)) for a time t. Then, for fixed h, let 
N = [t/h] and set 

W\ = Y(h) - Y (0), 

W 2 = Y(2h) - Y(h), 

W N = Y(Nh) - Y(Nh - h) 

Then random variables W\,..., IV ,v are independent and identically distributed normal 
random variables having variance ha 2 . We now use the fact (see Section 3.6.4) that 
( N — I )S 2 /(o 2 h) has a chi-squared distribution with N — 1 degrees of freedom, where 
S 2 is the sample variance defined by 

N 

s 2 = J2 (Wi -wf/( N - l) 

; = 1 

Since the expected value and variance of a chi-squared with k degrees of freedom are 
equal to k and 2k, respectively, we see that 

E[(N- \)S 2 /(a 2 h)] = N - 1 

and 

Var [(N - 1 )S 2 /(a 2 h)] = 2(N - 1) 

From this, we see that 
E[S 2 /h] = a 2 


and 

Var[S 2 //!] =2 a 4 /(N- 1) 

Hence, as we let h become smaller (and so N = [t/h] becomes larger) the variance of 
the unbiased estimator of a 2 becomes arbitrarily small. ■ 

Equation (10.12) is not the only way in which options can be priced so that no 
arbitrage is possible. Let (X(f), 0 ^ t ^ T] be any stochastic process satisfying, for 
s < t, 

E[e- at X{t)\X{u), 0 ^ u ^ s] = e~ as X(s) (10.13) 

(that is. Equation (10.9) is satisfied). By setting c, the cost of an option to purchase one 
share of the stock at time t for price K, equal to 

c = £■[<?“"'(X(r) - K) + ] 

it follows that no arbitrage is possible. 


(10.14) 
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Another type of stochastic process, aside from geometric Brownian motion, that 
satisfies Equation (10.13) is obtained as follows. Let Li, Y 2 , ... be a sequence of inde¬ 
pendent random variables having a common mean / 1 , and suppose that this process is 
independent of {N (r), t ^ 0}, which is a Poisson process with rate 1. Let 

NO) 

X(t ) = X0 n Yi 

1=1 

Using the identity 

N(s) NO) 

x(t )=xo n y i n y j 

,-=l j=N(s)+l 

and the independent increment assumption of the Poisson process, we see that, for 
s < t. 


E[X{t)\X(u), 0 < m < s] = X(s) E 


NO) 

n ^ 




Conditioning on the number of events between .s' and t yields 


E 


n ^ 


j=N(s )+1 


00 

= - s)T/n\ 

n =0 

_ e ~Ht~s)( 1—/Li) 


Hence, 

E[X(t)\X(u), 0^u^s] = X(s)e- H '- s)(l - fi) 

Thus, if we choose 1 and /r to satisfy 
1(1 — /r) = —a 

then Equation (10.13) is satisfied. Therefore, if for any value of 1 we let the K, have 
any distributions with a common mean equal to /x = 1 + a/1 and then price the options 
according to Equation (10.14), then no arbitrage is possible. 

Remark If {X(t), t ^ 0} satisfies Equation (10.13), then the process {e~ al X(t), 
t Js 0} is called a Martingale. Thus, any pricing of options for which the expected 
gain on the option is equal to 0 when { e~ a, X(t )} follows the probability law of some 
Martingale will result in no arbitrage possibilities. 

That is, if we choose any Martingale process (Z(f)} and let the cost of a (f, K) 
option be 

c = E[ e - at {e at Z(t) - £)+] 

= E[(Z(t ) - Ke ~ at )+] 


then there is no sure win. 
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In addition, while we did not consider the type of wager where a stock that is 
purchased at time s is sold not at a fixed time t but rather at some random time that 
depends on the movement of the stock, it can be shown using results about Martingales 
that the expected return of such wagers is also equal to 0. 

Remark A variation of the arbitrage theorem was first noted by de Finetti in 1937. A 
more general version of de Finetti’s result, of which the arbitrage theorem is a special 
case, is given in Reference 3. 


10.5 The Maximum of Brownian Motion with Drift 


For {X (y ), y ^ 0} being a Brownian motion process with drift coefficient /x and variance 
parameter a 2 , define 


M(t) = max X(y) 

to be the maximal value of the process up to time t. 

We will determine the distribution of M(t) by deriving the conditional distribution 
of M(t) given the value of X (t). To do so, we first show that the conditional distribution 
of X(y) . 0 y ^ f, given the value of X(t). does not depend on /x. That is, given 
the value of the process at time t, the distribution of its history up to time t does not 
depend on /x. 

We start with a lemma. 


Lemma 10.1 If Y[ , ..., Y n are independent and identically distributed normal random 
variables with mean 6 and variance v 2 , then the conditional distribution of Y\,, Y n 
given that Yi = x does not depend on 0. 

Proof. Because, given Y^i=i Yi = x , die value of Y n is determined by knowledge of 
those of Y[ ,. .., Y n -\ , it suffices to consider the conditional density of Y\ , ... , T„_ i 
given that ^” =1 T, — x. Letting X — Y^i=i ^ > this is obtained as follows. 


fv l . Y n _!\x(yi, ...,y„-t|x) 


fri .r M _i,x(yt, ...,y n -i,x) 

fx(x) 


Now, because 


n — 1 

Y\ = y\,..., Y n -1 = y n -\,X = x <$■ Yi = y\ -- T„_i = y n -i, Y n = x - ^ yi 

i =t 


it follows that 


fri . Y n _ u x(yi, ■■■,y n -i,x) 


n — 1 

fYu-.'Yn-uYn (yi, ■ ■ ■ , y,l-l,X - Yi ) 

1 = 1 
n — 1 

fy] Oh) • • • fr n . 1 0«-i)/r„ O - yi) 

i=t 




Brownian Motion and Stationary Processes 


625 


where the last equality used that Y\,... ,Y n are independent. Hence, using that X — 
\ Yi is normal with mean nO and variance n v 2 , we obtain 


fr n (x - Yli=i yt)fy iCyi) ■ • • />„_1(.v«-i) 


fvi . Y n -i\x(yi, ■■■, y n -i\x) = 


fx(x) 
<\2 2 - 


= K 


e-ix-YTiZl yi-of/2v 2 


e —(x—n0) 2 /2nv 2 
n —1 


1 

= K «p{-^rc*-2> 


ey 


i =1 


n—1 

+ E ( ^'- 0)2 - (x “ n0)2 / w]} 

i=t 


where A" does not depend on 0. Expanding the squares in the preceding, and treating 
everything that does not depend on 0 as a constant, shows that 


fr 1 .y„_iix(vi, ...,y n -i\x) 

1 n —1 n —1 

= K' exp{- -[-26{x - yi ) +9 2 - 20 y t + (n - \)6 2 + 29x - n9 2 ]} 

V i= 1 i=l 

= r 


where A 1 ' = /("'(u, yi, ..., y„_i, x) is a function that does not depend on 9. Thus the 
result is proven. ■ 

Remark Suppose that the distribution of random variables Y\,... ,Y n depends 
on some parameter 9. Further, suppose that there is some function D(Y\, ..., Y n ) 
of Y[,... ,Y n such that the conditional distribution of Y\. ..., Y n given the value 
of D(Y\,... ,Y n ) does not depend on 9. Then it is said in statistical theory that 
/)(K|,..., Y n ) is a sufficient statistic for 9. For suppose we wanted to use the data 
Yi ,..., Y n to estimate the value of 9. Because, given the value of I)(Y\, ..., Y n ), the 
conditional distribution of the data Y\, ..., Y n does not depend on 9, it follows that 
if the value of D{Y \,... ,Y n ) is known then no additional information about 9 can 
be obtained from knowing all the data values Y\, ..., Y n . Thus our preceding lemma 
proves that the sum of the data values of independent and identically distributed normal 
random variables is a sufficient statistic for their mean. (Because knowing the value of 
the sum is equivalent to knowing the value of YH=i Yi/n, called the sample mean , the 
common terminology in statistics is that the sample mean is a sufficient statistic for the 
mean of a normal population.) ■ 

Theorem 10.2 Fet X(t), t Yt 0 be a Brownian motion process with drift coefficient 
H and variance parameter a 2 . Given that X(t) = x, the conditional distribution of 
X(y), 0 ^ y ^ t is the same for all values of pc. 

Proof. Fix n and set t, = i t/n, i — \..... n. To prove the theorem we will show for 
any n that the conditional distribution of X(t\ ). ..., X (t n ) given the value of X(t) does 
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not depend on /x. To do so, let Y\ = X(fi), T; = X(tj)— i = 2, ..., n andnote 

that Y\,... ,Y n are independent and identically distributed normal random variables 
with mean 0 = fit/n. Because YTi=i = X(t) it follows from Lemma 10.1 that 
the conditional distribution of Y), ..., Y n given Xit) does not depend on /x. Because 
knowing Yi,..., Y n is equivalent to knowing X(t\ ), .... X (f„) the result follows. ■ 

We now derive the conditional distribution of M(t) given the value of X(t). 

Theorem 10.3 For y > x 

P(M(t) ^ y\X(t) = x) = e -2y<.y-x)/>° 2 , y ^o 

Proof. Because X(0) = 0 it follows that M(t) ^ 0, and so the result is true when 
y = 0 (since both sides are equal to 1 in this case). So suppose that y > 0. Because it 
follows from Theorem 10.2 that P(M{t) X y\X(t) = x ) does not depend on the value 
of /x, let us suppose that /x = 0. Now, let T y denote the first time that the Brownian 
motion reaches the value y, and note that it follows from the continuity property of 
Brownian motion that the event that M(t ) ^ y is equivalent to the event that Ty^t. 
This is true because before the process can exceed the positive value y it must, by 
continuity, first pass through that value. Now, let h be a small positive number for 
which y > x + h. Then 

P(M(t) ^ y, x ^ X(t) ^ x + h) = P(T y ^ t, x ^ X(t ) ^ x + h) 

= P(x < X{t) < X + h\Ty ^ t)P(Ty ^ t) 

Now, given Ty <t, the event x X X (t ) ^ x + h will occur if after hitting y the process 

will decrease by an amount between y — x — h and y — x in the time between T y and 

t. But because /x = 0, in any period of time the process is just as likely to increase as 
it is to decrease by an amount between y — x — h and y — x. Consequently, 

P(x ^ X(t) ^ x + h\T y ^ t) = P(2y — x — h ^ X(t) ^ 2y — x\T y ^ t) 

which gives that 

P(M(t) ^ y, x ^ X(t) ^ x + h) = P(2y — x — li ^ X(t) ^ 2y — x\T y ^ t) 

xP(T y ^ t) 

= P(2y — x — h ^ X(t) ^ 2y — x, T y ^ f) 
= P(2y — x — h ^ X(t ) ^ 2y — x) 


where the final equation follows because the assumption y > x + h implies that 
2 y — x — h > y and so, by the continuity of Brownian motion, if 2y — x — h X- X(t) 
then Ty^ t. Hence, 


P(M(t) ^ y\x ^ X(t) < x + h) = 


P(2y — x — h ^ X(t) ^ 2v — x) 
P(x ^ X(t) ^ x + h) 


fx(t)(2y - x) h + o(h) 
fx(t)(x)h + o(h) 
fx(t)(2y - x) + o(h)/h 
fx(t)W + o(h)/h 
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where fx(t), the density function of X(r), is the density function of a normal random 
variable with mean 0 and variance ter 2 . Letting h —> 0 in the preceding gives 

P(M(t) X y\X(t ) = x) = fx ^ 2y ~ x) 

JX(t)(x ) 
e -(2y-x) 2 /2 ter 2 
g—x^/2 ter 2 

_ e -2y(y-x)/ta 2 B 

With Z being a standard normal random variable, and ct> its distribution function, let 
<J>(x) = 1 - <J>(x) = P(Z > x ) 

We now have 

Corollary 10.1 


P(M(t) ^ y ) = + <t> ^ 


y - p-t 

a^ft 


Proof. Conditioning on X(t) and using Theorem 10.3 yields 


X y) 


-f 


P(M(t) ^ y\X{t) = x)fx(t)(x)dx 


-L 


/ y /*°° 

P(M(t) ^ y\X(t) = x)fx( t )(x)dx + / fx(t)(x)d. 

-oo J V 

y .o i 

'Jlnta 2 


2y(y-x)/,a 2 - 1 - e -(x-»t) 2 /2ta 2 ^ + p{x(f) > y) 


— 00 

1 


e -2y 2 /to 2 e -iJi 2 t 2 /2tc< 2 


V2 nt a 

■ 4yx )} dx + P(X(t) > y) 
? -(4y 2 +n 2 t 2 )/2t<x 2 

1 


f 


exp 


2 ter 2 


(x 2 — 2 jJLtX 


1 

y/2lTt l 


f 


x / exp 

/ —oo 


2ta 2 


(x — 2x([xt + 2y)) > dx + P(X(t ) > }>) 


Now, 


x 2 — 2x(fxt + 2 y) = (x — (fxt + 2y)) 2 — (/ it + 2y) 2 


giving that 


P(M{t) >y) = p -(4y 2 +M 2 f 2 -(Mf+2y) 2 )/2ra 2 __f- f y e -(x-nt-2y?/2tcx 2 ^ 

■yj'hct O J—oo 


+ P(X(t)>y) 
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Making the change of variable 

x — [it — 2y r 

w = - — -, dx — oy/t dw 

a ~Jt 

gives 

—fjLt—y 

P(M(t) > y) = e 2 ^ —= f °' f ' e~ w2/2 dw + P(X(t ) > y) 

V 27T J —oo 

= ’ j + />(X(0 > v) 

= .W* ( + J. (Tzif! 

V a~Jt ) \ a~Jt 

and the proof is complete. ■ 

In the proof of Theorem 10.3 we let T y denote the first time the Brownian motion is 
equal to y. In addition, as previously noted, the continuity of Brownian motion implies 
that, for y > 0, the process would have hit y by time t if and only if the maximum of 
the process by time t was at least y. Consequently, for y > 0, 

Ty < t & M(f) > y 

which, using Corollary 10.1, gives 

- v > o 


10.6 White Noise 


Let (X(t), t yz 0} denote a standard Brownian motion process and let / be a function 
having a continuous derivative in the region [a , b]. The stochastic integral fjj fit) dX(t) 
is defined as follows: 



/ (r) dX(t) = 


n 

lim 

n —>oo ' 

i=1 


max (/;—>0 




(10.15) 


where a = to < t\ < ■ ■ ■ < t n = b is a partition of the region [a,b]. Using the identity 
(the integration by parts formula applied to sums) 

n 

J2f(ti-i)[X(ti)-X( ti - 1)] 

i =1 

n 

= f{b)X(b) - f (a)X(a) - J2 - f(ti- 1 )] 

i=l 
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we see that 



f{t)dX(t) = f(b)X(b ) - f(a)X(a) - 



X(t)df(t ) 


(10.16) 


Equation (10.16) is usually taken as the definition of /(f) dX(t). 

By using the right side of Equation (10.16) we obtain, upon assuming the inter¬ 
changeability of expectation and limit, that 


/ 


f (t)dX(t) 


= 0 


Also, 


Var 



- X( ti _i)] 


n 

^/ 2 (f i - 1 )Var[A(f,)- X(h_i)] 
i =1 
n 

^/ 2 (f i _l)(f i -f;_i) 
i'=l 


where the top equality follows from the independent increments of Brownian motion. 
Hence, we obtain from Equation (10.15) upon taking limits of the preceding that 


Var 



f(t)dX(t) 



f 2 (t)dt 


Remark The preceding gives operational meaning to the family of quantities {dX{t), 
0 ^ t < oo) by viewing it as an operator that carries functions / into the values 
J a f(t)dX(t). This is called a white noise transformation, or more loosely {dX(t), 
0 ^ t < oo} is called white noise since it can be imagined that a time varying function 
/ travels through a white noise medium to yield the output (at time b) f(t) dX(t). 

Example 10.4 Consider a particle of unit mass that is suspended in a liquid and 
suppose that, due to the liquid, there is a viscous force that retards the velocity of the 
particle at a rate proportional to its present velocity. In addition, let us suppose that the 
velocity instantaneously changes according to a constant multiple of white noise. That 
is, if V(t ) denotes the particle’s velocity at t , suppose that 

V'(t) = -pV(t) + aX'(t) 


where {X(t), f ^ 0} is standard Brownian motion. This can be written as follows: 
e pt \y'(t) + PV(t)] = ae pt X'(t) 


or 


^-[e p ‘V(t)] = ae pt X'{t) 
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Hence, upon integration, we obtain 


e p, V(t) = V(0) +a f e fis X\s)ds 

Jo 

or 

V (f) = V(0)e~P r + a f e-P (, - s) dX(s ) 
Jo 

Hence, from Equation (10.16), 


V(t) = V(0)e~P r + a X(t) - f X(s)Pe~ p(, - s ' > ds 

Jo 


10.7 Gaussian Processes 

We start with the following definition. 

Definition 10.2 A stochastic process X(t),t ^ 0 is called a Gaussian, or a normal, 
process if X(t\X(t n ) has a multivariate normal distribution for all fj.... , t n . 

If {*(*), 1 ^ 0} is a Brownian motion process, then because each of X(ti), X(t 2 ), 

..., X(t n ) can be expressed as a linear combination of the independent normal random 
variables X(t{),X{t 2 ) — X{t\),X(ti) — Xfa), ■ ■ ■ ,X{t n ) — X{t n -\) it follows that 
Brownian motion is a Gaussian process. 

Because a multivariate normal distribution is completely determined by the marginal 
mean values and the covariance values (see Section 2.6) it follows that standard 
Brownian motion could also be defined as a Gaussian process having E[X(t)\ = 0 
and, for s ^ t, 

Cov(X(j), X(t)) = Cov(I(s), X(s) + X(t) - X(s)) 

= Cov(X(s), X(s)) + Cov(ZG), X(t) - X(s)) 

= Co v(X(s), X (s)) by independent increments 

= .v since Var(X(s)) = s (10.17) 

Let {X{t), t ^ 0} be a standard Brownian motion process and consider the process 
values between 0 and 1 conditional on X(l) = 0. That is, consider the conditional 
stochastic process {X(t), 0 X t X 11A(1) = 0}. Since the conditional distribution of 
X(t\),... ,X(t n ) is multivariate normal it follows that this conditional process, known 
as the Brownian bridge (as it is tied down both at 0 and at 1), is a Gaussian process. 
Let us compute its covariance function. As, from Equation (10.4), 


E[X(j)|X(l) = 0] = 0, for s < 1 
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we have that, for s < t < 1, 

Cov[(X(i),X(f))|X(l) = 0] 

= £[X(,y)X(r)|X(l) = 0] 

= E[E[X{s)X(t)\X(t), X(l) = 0]|X(1) = 0] 
= £[X(0£[X(j)|X(0]|X(1) = 0] 

= E^X(t) S -X(t)\X(l) = o] by (10.4) 

= S -E[X 2 (t) |X(1)=0] 

= -f(l — f) by (10.4) 

= j(l-0 


Thus, the Brownian bridge can be defined as a Gaussian process with mean value 0 and 
covariance function 5(1 — t), s ^ t. This leads to an alternative approach to obtaining 
such a process. 

Proposition 10.1 If (X(f), t Js 0} is standard Brownian motion, then { Z(t), 0 ^ 
r ^ 1} is a Brownian bridge process when Z(t) — X{t) — ?X(1). 

Proof. As it is immediate that (Z(r), t Js 0} is a Gaussian process, all we need 
verify is that E[Z(t)] = 0 and Cov(Z(.s), Z(t)) — 5(1 — t ), when s ^ t. The former 
is immediate and the latter follows from 

Cov(Z(5), Z(f)) = Cov(Z(5) - 5Z(1), X(t) - tZ(l)) 

= Cov(X(5), X(t)) - tCov(X(s), X(l)) 

- 5 Cov(X(l), X(t)) + st Cov(X(l), X(l)) 

= s — St — St + St 
= 5(1-0 


and the proof is complete. 

If WO, t ^ 0} is Brownian motion, then the process {Z(f), t Js 0} defined by 

Z(0= [ X(s) ds (10.18) 

Jo 

is called integrated Brownian motion. As an illustration of how such a process may 
arise in practice, suppose we are interested in modeling the price of a commodity 
throughout time. Letting Z(t) denote the price at t then, rather than assuming that 
{Z(0J is Brownian motion (or that log Z(t) is Brownian motion), we might want to 
assume that the rate of change of Z (?) follows a Brownian motion. For instance, we 
might suppose that the rate of change of the commodity’s price is the current inflation 
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rate, which is imagined to vary as Brownian motion. Hence, 


d 

dt 



Jo 


It follows from the fact that Brownian motion is a Gaussian process that {Z (t ), t ^ 0} 
is also Gaussian. To prove this, first recall that W \,..., W n is said to have a multivariate 
normal distribution if they can be represented as 


m 



where Uj, j = I...., m are independent normal random variables. From this it fol¬ 
lows that any set of partial sums of W\,..., W n are also jointly normal. The fact that 
, Z(t n ) is multivariate normal can now be shown by writing the integral in 
Equation (10.18) as a limit of approximating sums. 

As {Z (r), t Js 0} is Gaussian it follows that its distribution is characterized by its 
mean value and covariance function. We now compute these when {X(t), f ^ 0} is 
standard Brownian motion. 



o 



= 0 


For s ^ t, 


Cov[Z(s), Z(t)] = E[Z(s)Z(t )] 



o Jo 
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10.8 Stationary and Weakly Stationary Processes 

A stochastic process \X{t), t ^ 0} is said to be a stationary process if for all n, s, 
t\, ... ,t n the random vectors X(t\X(t„) and X(ti + s), ..., X(t n + s) have the 
same joint distribution. In other words, a process is stationary if, in choosing any fixed 
point s as the origin, the ensuing process has the same probability law. Two examples 
of stationary processes are: 

(i) An ergodic continuous-time Markov chain {X(t), t ^ 0} when 

P{X(0) = j} = Pj, 0 

where {Pj > j > 0} are the limiting probabilities. 

(ii) (A(t), t ^ 0} when X(t) = N(t + L) — N(t), t ^ 0, where L > 0 is a fixed 
constant and {N(t), t > 0} is a Poisson process having rate A. 

The first one of these processes is stationary for it is a Markov chain whose initial 
state is chosen according to the limiting probabilities, and it can thus be regarded as an 
ergodic Markov chain that we start observing at time oo. Hence, the continuation of this 
process at time s after observation begins is just the continuation of the chain starting at 
time oo+s, which clearly has the same probability for all .?. That the second example— 
where X(t) represents the number of events of a Poisson process that occur between 
t and t + L —is stationary follows from the stationary and independent increment 
assumption of the Poisson process, which implies that the continuation of a Poisson 
process at any time .v remains a Poisson process. 

Example 10.5 (The Random Telegraph Signal Process) Let { N(t), t ^ 0} denote a 
Poisson process, and let Xo be independent of this process and be such that 
P{X o = 1} = P{Ao =—1} = Defining Z(r) = Ao(— 1) a,( '^ then (A(t), t ^ 0} is 
called a random telegraph signal process. To see that it is stationary, note first that 
starting at any time t, no matter what the value of N(t), as Aq is equally likely to be 
either plus or minus 1, it follows that X(t ) is equally likely to be either plus or minus 1. 
Hence, because the continuation of a Poisson process beyond any time remains a Pois¬ 
son process, it follows that {X(f), t ^ 0} is a stationary process. 

Let us compute the mean and covariance function of the random telegraph signal. 

£[A(O] = £[A 0 (-1) a,w ] 

= £[Xo]£[(— 1)^^] by independence 

= 0 since E[X o] = 0, 

Cov[X(f), X(t + j)] = E[X(t)X(t + j)] 

= E[Xl(- l)M0+/V(;+5)] 

_ £[(—— \^d+s)—N(t )] 

_ E[{— ijM*-!-®)— 

= £[(-1)^] 
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= £(~1 Ye~ Xs 

i '=0 




= e~ 2Xs 


(10.19) 


For an application of the random telegraph signal consider a particle moving at a 
constant unit velocity along a straight line and suppose that collisions involving this 
particle occur at a Poisson rate k. Also suppose that each time the particle suffers a 
collision it reverses direction. Therefore, if Xo represents the initial velocity of the 
particle, then its velocity at time t —call it X(t )—is given by X(t) = Xo(— 1)^^, 
where N(t) denotes the number of collisions involving the particle by time t. Hence, 
if Xq is equally likely to be plus or minus 1, and is independent of \N(t), t 0}, then 
\X(t), t ^ 0| is a random telegraph signal process. If we now let 

D(t)= f X(s) ds 
Jo 

then D(t) represents the displacement of the particle at time t from its position at time 0. 
The mean and variance of D(t) are obtained as follows: 


E[D(t )] = 
Var[ D(t)] = 


f 


^[^(s)] ds = 0, 


E[D-(t)] 


E [.[ X{y,dy i X(u) du 

a; 

>jf 


E[X(y)X(u)] dy du 


E[X(y)X(u)] dy du 

n e -2k( u -y) dy d l{ by (10.19) 

. 


0 <y<u<t 

"t (*U 

2 


1 


1 


1 


— ( t — — T —c 
2k 2k 


-2 Xt 


The condition for a process to be stationary is rather stringent and so we define the 
process (X(f), f ^ 0} to be a second-order stationary or a weakly stationary process 
if E[X(t )] = c and Cov[X(t), X{t + i)] does not depend on t. That is, a process is 
second-order stationary if the first two moments of X(t) are the same for all t and 
the covariance between X(s) and Xit) depends only on 1 1 — ,v|. For a second-order 
stationary process, let 


R(s) = Cov[X(t), X(t + s)] 


As the finite dimensional distributions of a Gaussian process (being multivariate nor¬ 
mal) are determined by their means and covariance, it follows that a second-order 
stationary Gaussian process is stationary. 
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Example 10.6 (The Ornstein-Uhlenbeck Process) Let \X(t). t 0} he a standard 
Brownian motion process, and define, for a > 0, 

V(t) = e~ clt ^ 2 X(e 01 ') 

The process \V{t), 1 0} is called the Ornstein-Uhlenbeck process. It has been 

proposed as a model for describing the velocity of a particle immersed in a liquid 
or gas, and as such is useful in statistical mechanics. Let us compute its mean and 
covariance function. 


£[V(r)] = 0, 

Cov[V(/), V(t + J)] = e -«f/2 e -«(*+*)/2 
Cov[X(e“ r ), X(e a<t+s) )] = e~ at e~ as 12 e at by Equation (10.17) 

_ e ~ as ! 2 


Hence, {V(t), t ^ 0} is weakly stationary and as it is clearly a Gaussian process 
(since Brownian motion is Gaussian) we can conclude that it is stationary. It is inter¬ 
esting to note that (with a = AX) it has the same mean and covariance function as the 
random telegraph signal process, thus illustrating that two quite different processes can 
have the same second-order properties. (Of course, if two Gaussian processes have the 
same mean and covariance functions then they are identically distributed.) ■ 

As the following examples show, there are many types of second-order stationary 
processes that are not stationary. 

Example 10.7 (An Autoregressive Process) Let Zo, Z\, Z 2 ,... be uncorrelated 
random variables with E[Z n \ = 0. n 0 and 


Var(Z„) = 



A. 2 ), n = 0 
n 1 


where X 2 < 1. Define 


Xo = Z 0 , 

X n = XX n -i + Z,„ n > 1 (10.20) 


The process { X n , n ^ 0} is called a. first-order autoregressive process. It says that the 
state at time n (that is, X„) is a constant multiple of the state at time n — 1 plus a random 
error term Z n . 

Iterating Equation (10.20) yields 

X n = X(XX n -2 + Z„_i) + Z„ 

= X~X n -2 + XZ n — 1 + z„ 



1=0 
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and so 


n+m 


Cov(X„, X n+m ) = Covl x n+m -‘Zi 

\ i =0 i=0 

n 

= ^A."- f A." +m - , 'Cov(Z i> Z i ) 


i=0 

= <7 2 X 2n+m 

a 2 X m 


(ra*?-) 


1 -X 2 


where the preceding uses the fact that Z, and Z; are uncorrelated when i ^ j. As 
L’[ X„] = 0, we see that {A„, n AO) is weakly stationary (the dehnition for a discrete 
time process is the obvious analog of that given for continuous time processes). ■ 

Example 10.8 If, in the random telegraph signal process, we drop the requirement 
that P{X o = 1} = P{X o = — 1} = lj and only require that /-’|Xo] = 0, then the 
process {X(t), t Js 0} need no longer be stationary. (It will remain stationary if Xq 
has a symmetric distribution in the sense that —Xo has the same distribution as Xo.) 
However, the process will be weakly stationary since 

E[X{t)i = £[X 0 ]£[( - 1)^°] = 0, 

Cov[X(t), X(t + s)] = E[X(t)X(t + i)] 

= £[^]£[(-l) ww+JV(r+s) ] 

= E[Xl]e~ 2Xs from (10.19) ■ 


Example 10.9 Let Wo, W \, Wi, ... be uncorrelated with E[W n ] = p and Var( W n ) = 
a 2 , n ^ 0, and for some positive integer k define 




n 


W„ + W n -1 + ■ ■ ■ + W n - k 
k+ 1 


n Xz k 


The process {X„, n X k], which at each time keeps track of the arithmetic average of 
the most recent k + 1 values of the W s, is called a moving average process. Using the 
fact that the W n , n ^ 0 are uncorrelated, we see that 


Cov(X n , X n + m ) 


I (k+ 1 — m)a 2 
(k+l) 2 

0 , 


if 0 ^ m ^ k 
if m > k 


Hence, {X n , n ^ k] is a second-order stationary process. ■ 

Let {X n , n ^ 1} be a second-order stationary process with E[X n ] — //. An impor¬ 
tant question is when, if ever, does X n = Xi/n converge to // ? The following 

proposition, which we state without proof, shows that E[(X n — /i) 2 ] —> 0 if and only 
if Y11=i R(i)/ n —>■ 0. That is, the expected square of the difference between X n and p 
will converge to 0 if and only if the limiting average value of R(i) converges to 0. 
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Proposition 10.2 Let {X n , n ^ 1} be a second-order stationary process having mean 
H and covariance function R(i) = Co\>(X„, A'„ +I ), and let X n = Y^i=\ ! n ■ Then 
lim„^oo E[(X n - /x) 2 ] = 0 if and only if lim„^oo Ya=i R (i)/ n = °- 

10.9 Harmonic Analysis of Weakly Stationary Processes 

Suppose that the stochastic processes }X(t), —oo < t < oo} and {Y{t), —oo < t 
< oo} are related as follows: 



X(t — s)h(s) ds 


( 10 . 21 ) 


We can imagine that a signal, whose value at time t is X (f), is passed through a physical 
system that distorts its value so that Y (r), the received value at t, is given by Equation 
(10.21). The processes {X(f)} and \Y( 1 ) \ are called, respectively, the input and output 
processes. The function h is called the impulse response function. lfh(s) = 0 whenever 
5 < 0, then h is also called a weighting function since Equation (10.21) expresses the 
output at t as a weighted integral of all the inputs prior to t with h(s) representing the 
weight given the input .v time units ago. 

The relationship expressed by Equation (10.21) is a special case of a time invariant 
linear filter. It is called a filter because we can imagine that the input process {Xff)} 
is passed through a medium and then filtered to yield the output process (T(f)}. It 
is a linear filter because if the input processes {Z, (f)}, i = 1,2, result in the output 
processes {T,(f)}—that is, if Y,(t) = f Q Xj(t — s)h(s)ds —then the output process 
corresponding to the input process {flXi(f) + bX 2 (t)} is just [aY\(t) + bYjit)}. It is 
called time invariant since lagging the input process by a time r—that is, considering 
the new input process X(t) = X(t + r)—results in a lag of r in the output process 
since 

I X(t — s)h(s) ds — I X(t + r — s)h(s) ds — Y(t + r) 

Jo Jo 

Let us now suppose that the input process (X(t), —oo < t < oo} is weakly 
stationary with E[X(ty\ = 0 and covariance function 

R x (s) = Cov[X(t), X(t + i)] 

Let us compute the mean value and covariance function of the output process }T(f)}. 

Assuming that we can interchange the expectation and integration operations (a 
sufficient condition being that f |/i(j)| < oo* and, for some M < oo, £’|2£ r (f)| < M 
for all t) we obtain 


E[Y{t)]= / E[X(t — s)]h(s) ds — 0 


The range of all integrals in this section is from — oo to +oo. 
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Similarly, 


CovUfa), Y(t 2 )] = Cov 


J X(t\ — si)h(s\) ds\. 


I 


X(t 2 - S 2 )h(s 2 )ds 2 


-If 

-If 


Cov[Z(/i - v I ), X(t 2 - s 2 )]h(s\)h(s 2 )dsi ds 2 
Rxih —s 2 — t\ + s\)h{si)h{s 2 )ds\ds 2 (10.22) 


Hence, Ccv[T {t \), Y(t 2 )] depends on t\, t 2 only through t 2 — t\\ thus showing that 
{7(0} is also weakly stationary. 

The preceding expression for Ry(t 2 — fi) = Cov[T(?i), Y(t 2 )] is, however, more 
compactly and usefully expressed in terms of Fourier transforms of Rx and R y. Let, 
for i = ~J— 1, 


Rx(w )= 




and 


Ry(w) = 




' Rx(s) ds 


' Ry(s) ds 


denote the Fourier transforms, respectively, of Rx and Ry. The function Rx(u>) is also 
called the power spectral density of the process {X(t)J. Also, let 

denote the Fourier transform of the function h. Then, from Equation (10.22), 

Ry{w)= JJJ e lws Rx(s - s 2 + .si )h(S] )h(s 2 ) ds\ ds 2 ds 

gjuj(.s—, v 2 +5i ) r x ( s — + si)dse~ ,WS2 h(s 2 ) ds 2 e lwsi h(s\) ds\ 


Iff ■ 


= Rx(w)h(w)h(—w) 


(10.23) 


Now, using the representation 
e lx — cosx + i sin x, 

e~' x — cos (— x) + i sin (— x) — cosx — i sin.r 


we obtain 

h(w)h(—w) = 


:/ 


-/■ 


h(s) cos (ws) ds — i / h(s) sin(u)i) ds 


h(s) cos (ws) ds 


1 


h(s) cos(ws) ds + i / h(s) sin(ws) ds 

-\2 


h(s ) sin(ws) ds 
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1 / 


= / h(s)e 


ds 


= \h(w)Y 


Hence, from Equation (10.23) we obtain 


R y (w) = R x (w)\h(w)\ 


In words, the Fourier transform of the covariance function of the output process is 
equal to the square of the amplitude of the Fourier transform of the impulse function 
multiplied by the Fourier transform of the covariance function of the input process. 


Exercises 


In the following exercises (B(t), t Js 0} is a standard Brownian motion process and 
T a denotes the time it takes this process to hit a. 

*1. What is the distribution of B(s) + B(t), s ^ tl 

2. Compute the conditional distribution of B(s ) given that B(i\) = A and B ( t 2 ) = 
B, where 0 < t\ < s < t 2 . 

*3. Compute E[B (t[) B (t 2 ) B (t 2 )] for t\ < t 2 < tj. 

4. Show that 

P{T a < 00 }= 1, 

E[T a ]=oo, (i/O 

*5. What is P{T { < T_i < T 2 }1 

6. Suppose you own one share of a stock whose price changes according to a standard 
Brownian motion process. Suppose that you purchased the stock at a price b + 
c, c > 0, and the present price is b. You have decided to sell the stock either when 
it reaches the price b + c or when an additional time t goes by (whichever occurs 
first). What is the probability that you do not recover your purchase price? 

7. Compute an expression for 


P 


max B(s ) > x 


8. Consider the random walk that in each A t time unit either goes up or down the 
amount Va 7 with respective probabilities p and I — p, where p = \(l + pL\f~Kt). 

(a) Argue that as Ar —»■ 0 the resulting limiting process is a Brownian motion 
process with drift rate p.. 

(b) Using part (a) and the results of the gambler’s ruin problem (Section 4.5.1), 
compute the probability that a Brownian motion process with drift rate p 
goes up A before going down B, A > 0, B > 0. 

9. Let {X{t), t ^ 0} be a Brownian motion process with drift coefficient p and 
variance parameter a 2 . What is the joint density function of X(i') and Y(f), s < tl 
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*10. Let \X(t), t ^ 0} be a Brownian motion process with drift coefficient // and 
variance parameter a 2 . What is the conditional distribution of X(t) given that 
X(s) = c when 

(a) s < f? 

(b) t < 5? 

11. Consider a process whose value changes every h time units; its new value being its 
old value multiplied either by the factor e a ^ with probability p = j(l + £y/h), 

or by the factor e -crv ^ with probability 1 — p. As h goes to zero, show that this 
process converges to geometric Brownian motion with drift coefficient /x and 
variance parameter a 2 . 

12. A stock is presently selling at a price of $50 per share. After one time period, its 
selling price will (in present value dollars) be either $150 or $25. An option to 
purchase y units of the stock at time 1 can be purchased at cost cy. 

(a) What should c be in order for there to be no sure win? 

(b) If c = 4, explain how you could guarantee a sure win. 

(c) If c — 10, explain how you could guarantee a sure win. 

(d) Use the arbitrage theorem to verify your answer to part (a). 

13. Verify the statement made in the remark following Example 10.2. 

14. The present price of a stock is 100. The price at time 1 will be either 50, 100, or 
200. An option to purchase y shares of the stock at time 1 for the (present value) 
price ky costs cy. 

(a) If k = 120, show that an arbitrage opportunity occurs if and only if c > 80/3. 

(b) If k = 80, show that there is not an arbitrage opportunity if and only if 
20 < c < 40. 

15. The current price of a stock is 100. Suppose that the logarithm of the price of 
the stock changes according to a Brownian motion process with drift coefficient 
/x — 2 and variance parameter a 2 = 1. Give the Black-Scholes cost of an option 
to buy the stock at time 10 for a cost of 

(a) 100 per unit. 

(b) 120 per unit. 

(c) 80 per unit. 

Assume that the continuously compounded interest rate is 5 percent. 

A stochastic process {Y(t), t Jj 0} is said to be a Martingale process if, for .v < t, 

E[Y(t)\Y(u ), 0 < u ^ s] = Y(s ) 

16. If {Y(t), t > 0} is a Martingale, show that 

E[Y(t)] = £[K(0)] 

17. Show that standard Brownian motion is a Martingale. 
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18. Show that {Y(t), t ^ 0} is a Martingale when 
Y(t) = B 2 (t ) - t 
What is E[Y(t)]2 

Hint: First compute E[Y(t)\B(u), 0 ^ u ^ s], 

*19. Show that {Y(t), t ^ 0} is a Martingale when 

Y(t) = exp {cB(t) — c 2 t/2 } 

where c is an arbitrary constant. What is E[Y ft)]? 

An important property of a Martingale is that if you continually observe the 
process and then stop at some time T , then, subject to some technical conditions 
(which will hold in the problems to be considered), 

E[Y(T)] = £[T(0)] 

The time T usually depends on the values of the process and is known as a 
stopping time for the Martingale. This result, that the expected value of the stopped 
Martingale is equal to its fixed time expectation, is known as the Martingale 
stopping theorem. 

*20. Let 


T = Minjf: B(t) = 2-4 1] 

That is, T is the first time that standard Brownian motion hits the line 2 — At. Use 
the Martingale stopping theorem to find E[T], 

21. Let {X(t), f 0} be Brownian motion with drift coefficient // and variance 
parameter a 2 . That is, 

X(t) — oB(t ) + [it 


Let [i > 0, and for a positive constant x let 


T = Min{f: X(t ) = jc} 


= Min 


t: B(t) = 


x — [It 


That is, T is the first time the process [X(t), t 0[ hits x. Use the Martingale 
stopping theorem to show that 


E[T]=x/[x 


22. Let X{t) = crB(t) + /it, and for given positive constants A and B, let p denote 
the probability that [X(t), t ^ 0} hits A before it hits —B. 
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(a) Define the stopping time T to be the first time the process hits either A or — B . 
Use this stopping time and the Martingale defined in Exercise 19 to show that 

£[exp{c(X(r) - pT)/a - c 2 T/2}\ = 1 

(b) Let c — — 2p/a, and show that 

Zs[exp{ — 2pX(T)/a\\ = 1 

(c) Use part (b) and the definition of T to find p. 

Hint: What are the possible values of exp{—2 pX(T)/< j 2 }1 
23. Let X(t) = aB(t) + pt, and define T to be the first time the process {X(t), t ^ 0} 
hits either A or — B , where A and B are given positive numbers .Use the Martingale 
stopping theorem and part (c) of Exercise 22 to find E[T\. 

*24. Let {X(f), r ^ 0} be Brownian motion with drift coefficient p and variance 
parameter a 2 . Suppose that p > 0. Let x > 0 and define the stopping time T (as 
in Exercise 21) by 

T = Min{r: X(t) = x ) 

Use the Martingale defined in Exercise 18, along with the result of Exercise 21, 
to show that 

VarfL) = xa 2 1 p? 

In Exercises 25 to 27, {X(t), t 0} is a Brownian motion process with drift 
parameter p and variance parameter a 2 . 

25. Suppose every A time units a process either increases by the amount a^A with 
probability p or decreases by the amount a y/~K with probability 1 — p where 

P = + — VA). 

2 (7 

Show that as A goes to 0, this process converges to a Brownian motion process 
with drift parameter p and variance parameter a 2 . 

26. Let T y be the first time that the process is equal to y. For y > 0, show that 


P(T y < oo) 


I. 

glyii/a 2 


if p ^ 0 
if p < 0 


Let M — maxo^/<oo X(f) be the maximal value ever attained. Explain why the 
preceding implies that, when p < 0, M is an exponential random variable with 
rate —2 p/a 2 . 

27. Determine the distribution function of mino<r v ^ f X (y). 
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28. Compute the mean and variance of 

(a) /q t dB(t) 

(b) fg t 2 dB{t) 

29. Let Y(t) = tB(l/t), t > 0 and L(0) = 0. 

(a) What is the distribution of V(t)l 

(b) Compare Cov(L(j), Y(t)). 

(c) Argue that (L(f), t ^ 0[ is a standard Brownian motion process. 

30. LetL(f) = B(a 2 t)/aioxa > 0. Argue that (L(f)} is a standard Brownian motion 
process. 

31. For s < t, argue that Bis) — j Bit) and Bit) are independent. 

32. Let {Z(?), t ^ 0} denote a Brownian bridge process. Show that if 

Y(t) = (? + l)Z(t/(t + 1)) 

then {Y(t), t ^ 0} is a standard Brownian motion process. 

33. Let X(t) = N(t + 1) — N(t) where {N(t), t ^ 0} is a Poisson process with 
rate X. Compute 

Cov[X(t), X(t + .v)] 

34. Let {N(t), t Js 0} denote a Poisson process with rate X and define Y(t) to be the 
time from t until the next Poisson event. 

(a) Argue that {Y(t), t Js 0} is a stationary process. 

(b) Compute Cov[F (t), Y(t + s)]. 

35. Let {X(f), —oo < t < oo} be a weakly stationary process having covariance 
function Rx(s) = Cov[Z(r), X(t + s)]. 

(a) Show that 

Var (X(t +s)- X(t)) = 2R X (Q) - 2R x (t) 

(b) If Y (t) = X(t + 1) — X(t) show that {Y(t), — oo < t < oo} is also weakly 
stationary having a covariance function Ry(s) = Covf Yit), Yit + v)] that 
satisfies 

Ry{s) = 2R x (s) - Rx(s - 1 ) - Rx(s + 1 ) 

36. Let Y i and Yn be independent unit normal random variables and for some constant 
u> set 


X(t) = Y\ cos wt + Y 2 sin wt, —00 < t < 00 

(a) Show that {X(r)} is a weakly stationary process. 

(b) Argue that {XfV)} is a stationary process. 
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37. Let { X(t), —oo < t < oc] he weakly stationary with covariance function R(s) = 
Co v(X(t), X (t+.y)) and let R(w) denote the power spectral density of the process. 

(i) Show that R(w ) = R(—w). It can be shown that 



(ii) Use the preceding to show that 
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Simulation 



11.1 Introduction 

Let X = (X i, ..., X n ) denote a random vector having a given density function 
f(x i, ..., x„) and suppose we are interested in computing 

£(S( X)] = //■ 

for some n -dimensional function g. For instance, g could represent the total delay in 
queue of the first [n /2] customers when the X values represent the first [n /2] interarrival 
and service times.* In many situations, it is not analytically possible either to compute 
the preceding multiple integral exactly or even to numerically approximate it within a 
given accuracy. One possibility that remains is to approximate £[g(X)] by means of 
simulation. 

To approximate £[g(X)], start by generating a random vector X (1) = 

xj 1 *, ..., X*, 1 ^ having the joint density f(x i,..., x„) and then compute 7® = 

g(X (1) ). Now generate a second random vector (independent of the first) X ,2) and 
compute Y <2> = g(X (2> ). Keep on doing this until r, a fixed number of independent 

and identically distributed random variables K (,) = g(X^), i — 1. r have been 

generated. Now by the strong law of large numbers, we know that 

y(l) y(r) 

lim -= £[T (,) ] = E[g(X)] 

r—»oo r 

* We are using the notation [a] to represent the largest integer less than or equal to a. 

Introduction to Probability Models, Eleventh Edition. http://dx.doi.org/10.1016/B978-0-12-407948-9.00011-6 
© 2014 Elsevier Inc. All rights reserved. 


/ S(X1 . X " )/<X| . 











646 


Introduction to Probability Models 


and so we can use the average of the generated 7s as an estimate of ,E[g(X)]. This 
approach to estimating £[g(X)] is called the Monte Carlo simulation approach. 

Clearly there remains the problem of how to generate, or simulate , random vectors 
having a specified joint distribution. The first step in doing this is to be able to generate 
random variables from a uniform distribution on (0, 1). One way to do this would 
be to take 10 identical slips of paper, numbered 0, 1,..., 9, place them in a hat and 
then successively select n slips, with replacement, from the hat. The sequence of digits 
obtained (with a decimal point in front) can be regarded as the value of a uniform (0,1) 
random variable rounded off to the nearest (A) . For instance, if the sequence of digits 
selected is 3,8,7,2,1, then the value of the uniform (0, 1) random variable is 0.38721 (to 
the nearest 0.00001). Tables of the values of uniform (0, 1) random variables, known as 
random number tables, have been extensively published (for instance, see The RAND 
Corporation, A Million Random Digits with 100,000 Normal Deviates (New York: The 
Free Press, 1955)). Table 11.1 is such a table. 

However, this is not the way in which digital computers simulate uniform (0, 1) 
random variables. In practice, they use pseudo random numbers instead of truly random 
ones. Most random number generators start with an initial value X$, called the seed, 
and then recursively compute values by specifying positive integers a, c, and m, and 
then letting 

X n+ \ = ( aX n + c ) modulo m, n ^ 0 

where the preceding means that aX„ +c is divided by m and the remainder is taken as 
the value of X n+ \. Thus each X n is either 0, 1, ..., or m — I and the quantity X n /m 
is taken as an approximation to a uniform (0, 1) random variable. It can be shown that 
subject to suitable choices for a , c, m , the preceding gives rise to a sequence of numbers 
that looks as if it were generated from independent uniform (0, 1) random variables. 

As our starting point in the simulation of random variables from an arbitrary dis¬ 
tribution, we shall suppose that we can simulate from the uniform (0, 1) distribution, 
and we shall use the term “random numbers” to mean independent random variables 
from this distribution. In Sections 11.2 and 11.3 we present both general and special 
techniques for simulating continuous random variables; and in Section 11.4 we do the 
same for discrete random variables. In Section 11.5 we discuss the simulation both 
of jointly distributed random variables and stochastic processes. Particular attention 
is given to the simulation of nonhomogeneous Poisson processes, and in fact three 
different approaches for this are discussed. Simulation of two-dimensional Poisson 
processes is discussed in Section 11.5.2. In Section 11.6 we discuss various methods 
for increasing the precision of the simulation estimates by reducing their variance; and 
in Section 11.7 we consider the problem of choosing the number of simulation runs 
needed to attain a desired level of precision. Before beginning this program, however, 
let us consider two applications of simulation to combinatorial problems. 

Example 11.1 (Generating a Random Permutation) Suppose we are interested in 
generating a permutation of the numbers 1,2 ,,n that is such that all n\ possible 
orderings are equally likely. The following algorithm will accomplish this by first 
choosing one of the numbers 1 ,,n at random and then putting that number in 
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Table 11.1 A Random Number Table 


04839 

96423 

24878 

82651 

66566 

14778 

76797 

14780 

13300 

87074 

68086 

26432 

46901 

20848 

89768 

81536 

86645 

12659 

92259 

57102 

39064 

66432 

84673 

40027 

32832 

61362 

98947 

96067 

64760 

64584 

25669 

26422 

44407 

44048 

37937 

63904 

45766 

66134 

75470 

66520 

64117 

94305 

26766 

25940 

39972 

22209 

71500 

64568 

91402 

42416 

87917 

77341 

42206 

35126 

74087 

99547 

81817 

42607 

43808 

76655 

62797 

56170 

86324 

88072 

76222 

36086 

84637 

93161 

76038 

65855 

95876 

55293 

18988 

27354 

26575 

08625 

40801 

59920 

29841 

80150 

29888 

88604 

67917 

48708 

18912 

82271 

65424 

69774 

33611 

54262 

73577 

12908 

30883 

18317 

28290 

35797 

05998 

41688 

34952 

37888 

27958 

30134 

04024 

86385 

29880 

99730 

55536 

84855 

29080 

09250 

90999 

49127 

20044 

59931 

06115 

20542 

18059 

02008 

73708 

83517 

18845 

49618 

02304 

51038 

20655 

58727 

28168 

15475 

56942 

53389 

94824 

78171 

84610 

82834 

09922 

25417 

44137 

48413 

25555 

21246 

35605 

81263 

39667 

47358 

56873 

56307 

61607 

49518 

89356 

20103 

33362 

64270 

01638 

92477 

66969 

98420 

04880 

45585 

46565 

04102 

88720 

82765 

34476 

17032 

87589 

40836 

32427 

70002 

70663 

88863 

39475 

46473 

23219 

53416 

94970 

25832 

69975 

94884 

19661 

72828 

06990 

67245 

68350 

82948 

11398 

42878 

80287 

88267 

47363 

46634 

40980 

07391 

58745 

25774 

22987 

80059 

39911 

96189 

41151 

14222 

83974 

29992 

65381 

38857 

50490 

83765 

55657 

14361 

31720 

57375 

33339 

31926 

14883 

24413 

59744 

92351 

97473 

89286 

35931 

04110 

31662 

25388 

61642 

34072 

81249 

35648 

56891 

69352 

48373 

45578 

93526 

70765 

10592 

04542 

76463 

54328 

02349 

17247 

28865 

14777 

20492 

38391 

91132 

21999 

59516 

81652 

27195 

48223 

46751 

22923 

04153 

53381 

79401 

21438 

83035 

92350 

36693 

31238 

59649 

91754 

05520 

91962 

04739 

13092 

97662 

24822 

94730 

06496 

35090 

04822 

47498 

87637 

99016 

71060 

88824 

71013 

18735 

20286 

23153 

72924 

23167 

49323 

45021 

33132 

12544 

41035 

80780 

45393 

44812 

12515 

23792 

14422 

15059 

45799 

22716 

19792 

09983 

74353 

68668 

30429 

85900 

98275 

32388 

52390 

16815 

69298 

82732 

38480 

73817 

32523 

42559 

78985 

05300 

22164 

24369 

54224 

35083 

19687 

11062 

91491 

14349 

82674 

66523 

44133 

00697 

35552 

35970 

19124 

63318 

29686 

17403 

53363 

44167 

64486 

64758 

75366 

76554 

31601 

12614 

33072 

23632 

27889 

47914 

02584 

37680 

20801 

72152 

39339 

34806 

08930 


position n\ it then chooses at random one of the remaining n — 1 numbers and puts 
that number in position n — 1; it then chooses at random one of the remaining n — 2 
numbers and puts it in position n — 2, and so on (where choosing a number at random 
means that each of the remaining numbers is equally likely to be chosen). However, so 
that we do not have to consider exactly which of the numbers remain to be positioned, 
it is convenient and efficient to keep the numbers in an ordered list and then randomly 
choose the position of the number rather than the number itself. That is, starting with 
any initial ordering pi, P 2 , ..., p n , we pick one of the positions 1at random 
and then interchange the number in that position with the one in position n. Now we 
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randomly choose one of the positions 1,..., n — 1 and interchange the number in this 
position with the one in position n — 1, and so on. 

To implement the preceding, we need to be able to generate a random variable that is 
equally likely to take on any of the values 1, 2,..., k. To accomplish this, let U denote 
a random number—that is, U is uniformly distributed over (0, 1)—and note that kU is 
uniform on (0, k) and so 


1 

P{i — 1 < kU < i] = -, i = l,..., k 


k 


Hence, the random variable I = [ kU] + 1 will be such that 


1 


P{I = i } = P{[kU] = i - 1} = P{i - 1 <kU <i}= - 

k 

The preceding algorithm for generating a random permutation can now be written as 
follows: 

Step 1: Let pi, pi, ..., Pn be any permutation of 1.2.....« (for instance, 
we can choose pj = j, j = 1,..., n). 

Step 2: Set k = n. 

Step 3: Generate a random number U and let / = [kU] + 1. 

Step 4: Interchange the values of pi and pi-. 

Step 5: Let k — k — 1 and if k > 1 go to step 3. 

Step 6: pi,, p n is the desired random permutation. 

For instance, suppose n — 4 and the initial permutation is 1,2, 3,4. If the first value 
of I (which is equally likely to be either 1, 2, 3, 4) is / = 3, then the new permutation 

is 1, 2, 4, 3. If the next value of I is I = 2 then the new permutation is 1, 4, 2, 3. If the 

final value of / is / = 2, then the final permutation is 1,4, 2, 3, and this is the value of 
the random permutation. 

One very important property of the preceding algorithm is that it can also be used to 
generate a random subset, say of size r, of the integers 1,...,«. Namely, just follow 
the algorithm until the positions n, n — 1,..., n — r + 1 are filled. The elements in 
these positions constitute the random subset. ■ 

Example 11.2 (Estimating the Number of Distinct Entries in a Large List) Consider 
a list of n entries where n is very large, and suppose we are interested in estimating d, 
the number of distinct elements in the list. If we let m, denote the number of times that 
the element in position i appears on the list, then we can express d by 



To estimate d, suppose that we generate a random value X equally likely to be either 
1,2,... ,n (that is, we take X = [ nU] + 1) and then let m(X) denote the number of 
times the element in position X appears on the list. Then 
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Hence, if we generate k such random variables X\. ..., we can estimate d by 

«Ef=i !/'«(*,) 

k 

Suppose now that each item in the list has a value attached to it— v(i) being the value of 
the i th element. The sum of the values of the distinct items—call it v —can be expressed 
as 


v = 


E 


v(i) 

m(i ) 


Now if X = [nU] + 1, where U is a random number, then 


E 


' v(X) - 

_m(X)_ 


E 


v(i) 1 

m (i) n 


v 

n 


Hence, we can estimate v by generating X \, ..., X/■ and then estimating v by 


n v(Xj) 

k m (Xi ) 

l = 1 


For an important application of the preceding, let A, = {a/i,..., ai ni ), i = 1, ... ,s 
denote events, and suppose we are interested in estimating F’(|JJ_ 1 AA. Since 


P 



E 

aeUAi 


s rii 

p(fl) = EE 


i = l ;=1 


m(ciij) 


where is the number of events to which the point a,-j belongs, the preceding 

method can be used to estimate FfU] An. 

Note that the preceding procedure for estimating v can be effected without prior 
knowledge of the set of values {i>i,..., v n |. That is, it suffices that we can determine 
the value of an element in a specific place and the number of times that element appears 
on the list. When the set of values is a priori known, there is another approach available 
as will be shown in Example 11.11. ■ 


11.2 General Techniques for Simulating Continuous 
Random Variables 

In this section we present three methods for simulating continuous random variables. 

11.2.1 The Inverse Transformation Method 

A general method for simulating a random variable having a continuous distribution— 
called the inverse transformation method —is based on the following proposition. 
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Proposition 11.1 Let U be a uniform (0, 1) random variable. For any continuous 
distribution function F if we define the random variable X by 

X=F~\U ) 

then the random variable X has distribution function F. (F~ l (u) is defined to equal 
that value x for which F(x) = u.) 

Proof. 

F x {a) = P{X < a} 

= P{F~ l (U)^a} (11.1) 

Now, since Fix) is a monotone function, it follows that F~ l (U) F- a if and only if 
U F(a). Hence, from Equation (11.1), we see that 

F x {a) = P{U < F(a)} 

= F(a ) ■ 

Hence, we can simulate a random variable X from the continuous distribution F, 
when F ~ 1 is computable, by simulating a random number U and then setting X = 
F~ l (U). 

Example 11.3 (Simulating an Exponential Random Variable) If F(x ) = 1 — e~ x , 

then F~ l (n) is that value of x such that 

1 — e~ x — u 


or 


x = — log(l — u) 

Hence, if U is a uniform (0, 1) variable, then 
E -1 (t/) = — log(l — JJ) 

is exponentially distributed with mean 1. Since 1 — U is also uniformly distributed on 
(0, 1) it follows that — log U is exponential with mean 1. Since cX is exponential with 
mean c when X is exponential with mean 1, it follows that —clog U is exponential 
with mean c. ■ 


11.2.2 The Rejection Method 

Suppose that we have a method for simulating a random variable having density function 
g(x). We can use this as the basis for simulating from the continuous distribution having 
density f(x ) by simulating Y from g and then accepting this simulated value with a 
probability proportional to f(Y)/g(Y). 

Specifically, let c be a constant such that 

f(y) . f ,, 

- ^ c tor all y 

sCy) 
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We then have the following technique for simulating a random variable having density /. 

Rejection Method 

Step 1: Simulate F having density g and simulate a random number U. 

Step 2: If U ^ f(Y)/cg(Y) set X — Y. Otherwise return to step 1. 

Proposition 11.2 The random variable X generated by the rejection method has 

density function /. 

Proof. Let X be the value obtained, and let N denote the number of necessary itera¬ 
tions. Then 

P{X x] = P{Y n < x} 

= P{Y ^x\U < f(Y)/cg(Y)} 

= P{Y ^x,U ^ f(Y)/cg(Y)} 

K 

= f P{Y^x,U^ f(Y)/cg(Y)\Y = y}g(y) dy 

K 

_ Zoo ( f{y)/cg{y))g{y ) dy 
~ K 

_ /_ 00 /(v) dy 
Kc 

where K = P{U X f(Y)/cg(Y)}. Letting x -* oo shows that K = 1/c and the proof 

is complete. ■ 

Remarks 

(i) The preceding method was originally presented by Von Neumann in the special 
case where g was positive only in some finite interval (a, b), and Y was chosen to 
be uniform over (a, b ) (that is, Y = a + (b — a)U). 

(ii) Note that the way in which we “accept the value Y with probability f(Y)/cg(Y)” 
is by generating a uniform (0, 1) random variable U and then accepting Y if 
U < f(Y)/cg(Y). 

(iii) Since each iteration of the method will, independently, result in an accepted value 
with probability P{U X f(Y)/cg(Y)} = 1/c it follows that the number of itera¬ 
tions is geometric with mean c. 

(iv) Actually, it is not necessary to generate a new uniform random number when 
deciding whether or not to accept, since at a cost of some additional computation, a 
single random number, suitably modified at each iteration, can be used throughout. 
To see how, note that the actual value of U is not used—only whether or not 
U < f (Y)/cg(Y). Hence, if Y is rejected—that is, if U > f (Y)/cg(Y) —we can 
use the fact that, given Y, 

U - f(Y)/cg(Y) cUg(Y ) - /(F) 


1 - f(Y)/cg(Y) cg(Y ) - /(F) 
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is uniform on (0, 1). Hence, this may be used as a uniform random number in 
the next iteration. As this saves the generation of a random number at the cost of 
the preceding computation, whether it is a net savings depends greatly upon the 
method being used to generate random numbers. ■ 

Example 11.4 Let us use the rejection method to generate a random variable having 
density function 

fix) = 20x(l — x) 3 , 0 < x < 1 

Since this random variable (which is beta with parameters 2, 4) is concentrated in the 
interval (0, 1), let us consider the rejection method with 

g(x) = 1, 0 < x < 1 


To determine the constant c such that f(x)/g(x) Sj c, we use calculus to determine 
the maximum value of 


—- = 20x(l — x) 3 
g(x) 


Differentiation of this quantity yields 


d 

dx 


fix) 

g(x) _ 


= 20 


|\l — x ) 3 — 3x(l — x) 2 J 


Setting this equal to 0 shows that the maximal value is attained when x = !, and thus 


fix) 

gix) 


< 20 



Hence, 


135 


64 




fix) 

cgix) 


256 

~2J~ 


x(l — x) 3 


and thus the rejection procedure is as follows: 

Step 1: Generate random numbers TJ\ and U 2 - 

Step 2: W Uj f ffr 6 / 1 ( 1 — t/i ) 3 , stop and set X — U\. Otherwise return to step 1. 

The average number of times that step 1 will be performed is c = -^. ■ 

Example 11.5 (Simulating a Normal Random Variable) To simulate a standard 
normal random variable Z (that is, one with mean 0 and variance 1) note first that the 
absolute value of Z has density function 


fix) = 



0 < x < 00 


( 11 . 2 ) 
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We will start by simulating from the preceding density by using the rejection method 
with 


g(x) — e x , 0 < x < oo 

Now, note that 

^ 7^7 = s/le/jr exp{ — (x - l) 2 /2} < yflejv: 

gW 

Hence, using the rejection method we can simulate from Equation (11.2) as follows: 

(a) Generate independent random variables Y and U. Y being exponential with rate 1 
and U being uniform on (0, 1). 

(b) If U < exp{ — (Y — l) 2 /2}, or equivalently, if 

- log U ^ (Y - l) 2 /2 
set X — Y. Otherwise return to step (a). 

Once we have simulated a random variable X having Density Function (11.2) we can 
then generate a standard normal random variable Z by letting Z be equally likely to be 
either X or — X. 

To improve upon the foregoing, note first that from Example 11.3 it follows that 
— log U will also be exponential with rate 1. Hence, steps (a) and (b) are equivalent to 
the following: 

(a') Generate independent exponentials with rate 1, Ti, and Yi. 

(V) Set X — Y\ if Yi ^ ( Y\ — l) 2 /2. Otherwise return to step (a'). 

Now suppose that we accept step (b'). It then follows by the lack of memory property 
of the exponential that the amount by which Yi exceeds ( Y\ — 1 ) 2 /2 will also be 
exponential with rate 1. 

Hence, summing up, we have the following algorithm which generates an exponen¬ 
tial with rate 1 and an independent standard normal random variable: 

Step 1: Generate Y \, an exponential random variable with rate 1. 

Step 2: Generate Yi, an exponential with rate 1. 

Step 3: If Y 2 — (Tj — l) 2 /2 > 0, set Y — Y 2 — (Y\ — l) 2 /2 and go to step 4. Otherwise 
go to step 1. 

Step 4: Generate a random number U and set 
f Y U iff/^I 
l —Fi, if U>\ 

The random variables Z and Y generated by the preceding are independent with Z 
being normal with mean 0 and variance 1 and Y being exponential with rate 1. (If we 
want the normal random variable to have mean /1 and variance cr 2 , just take ji + aZ.) 
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Remarks 

(i) Since c = *J2e/n ~ 1.32, the preceding requires a geometric distributed number 
of iterations of step 2 with mean 1.32. 

(ii) The final random number of step 4 need not be separately simulated but rather can 
be obtained from the first digit of any random number used earlier. That is, suppose 
we generate a random number to simulate an exponential; then we can strip off 
the initial digit of this random number and just use the remaining digits (with the 
decimal point moved one step to the right) as the random number. If this initial digit 
is 0, 1, 2, 3, or 4 (or 0 if the computer is generating binary digits), then we take the 
sign of Z to be positive and take it to be negative otherwise. 

(iii) If we are generating a sequence of standard normal random variables, then we can 
use the exponential obtained in step 4 as the initial exponential needed in step 1 
for the next normal to be generated. Hence, on the average, we can simulate a unit 
normal by generating 1.64 exponentials and computing 1.32 squares. 


11.2.3 The Hazard Rate Method 

Let F be a continuous distribution function with F( 0) = 1. Recall that /.ft), the hazard 
rate function of F, is defined by 


m = 


f(t ) 

Fit)’ 


t ^ 0 


(where fit ) = F'it) is the density function). Recall also that '/it) represents the 
instantaneous probability intensity that an item having life distribution F will fail at 
time t given it has survived to that time. 

Suppose now that we are given a bounded function X)t), such that / 0 °° /it) dt = oo, 
and we desire to simulate a random variable S having /it) as its hazard rate function. 

To do so let X be such that 


X)t) ^ X for all t ^ 0 


To simulate from Xit), t ^ 0, we will 

(a) simulate a Poisson process having rate X. We will then only “accept” or “count” 
certain of these Poisson events. Specifically we will 

(b) count an event that occurs at time t, independently of all else, with probability 
Xit)/X. 

We now have the following proposition. 

Proposition 11.3 The time of the first counted event—call it S — is a random variable 
whose distribution has hazard rate function X)t), t f 0. 
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Proof. 

P{t < S < t + dt\S > t } 

= /’{first counted event in (f, t + uh)|no counted events prior to r} 

= P {Poisson event in (1, t + dt ), it is counted|no counted events prior to t] 
= PjPoisson event in ( t , t + dt), it is counted} 

MO 

= [A dt + o(dt)] -= MO dt + o(dt ) 

X 

which completes the proof. Note that the next to last equality follows from the inde¬ 
pendent increment property of Poisson processes. ■ 

Because the interarrival times of a Poisson process having rate X are exponential with 
rate X, it thus follows from Example 11.3 and the previous proposition that the following 
algorithm will generate a random variable having hazard rate function X(t), t Jj 0. 


Hazard Rate Method for Generating S: X s (t) = k(t) 

Let X be such that X(t) ^ X for all t ^ 0. Generate pairs of random variables U,, X,, i ^ 
1, with Xj being exponential with rate X and t/; being uniform (0, 1), stopping at 


A? = min|u: 


Set 


N 


s = J2 x > 


i =1 


To compute we need the result, known as Wald’s equation, which states that 

if X\, Xi ,... are independent and identically distributed random variables that are 
observed in sequence up to some random time N then 


E 



= £[iV]£[X] 


More precisely let X i, X 2 ,... denote a sequence of independent random variables and 
consider the following definition. 

Definition 11.1 An integer-valued random variable N is said to be a slopping time 
for the sequence Xi, Xi, ,.. if the event {N = n] is independent of X n +\, X n + 2 , ■ ■ ■ 
for all n = 1, 2 . 

Intuitively, we observe the X n s in sequential order and N denotes the number 
observed before stopping. If N = n, then we have stopped after observing X\, ... ,X n 
and before observing X n+ \ , X n+ 2 ,. .. for all n = 1,2,... . 
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Example 11.6 Let X n , n = 1, 2,... , be independent and such that 
P{X n = 0} = P{X n = 1} = n= 1,2,... 

If we let 


N — min{n : X\ + • • • + X n = 10} 


then N is a stopping time. We may regard N as being the stopping time of an experiment 
that successively flips a fair coin and then stops when the number of heads reaches 10. 


Proposition 11.4 (Wald’s Equation) If X \, Xj, ... are independent and identically 
distributed random variables having finite expectations, and if N is a stopping time for 
X i, X 2 , ■ ■ ■ such that E[iV] < 00 , then 


N 




= E[iV]£[X] 


L 1 

Proof. Letting 

'l, if N > n 


In = 

we have 
N 


0, if iV < n 


J2 Xn = H XJn 


n= 1 

Hence, 


n= 1 


N 

E x » 

Ln=l 


= E 


J2 X " I » 


,n= 1 


= Y J E[X n I n \ 


(11.3) 


n =1 


However, I n = 1 if and only if we have not stopped after successively observing 
Xi, ..., X„_ 1 . Therefore, /„ is determined by Xi,..., X n -\ and is thus independent 
of X n . From Equation (11.3) we thus obtain 


E*" 


L/i=l 


= J2 E [X n ]E[I n ] 

n= 1 

00 

= E[X]J2 E lIn] 


n =1 


= E[X]E 

Ln=l J 
= £[X]£[iV] 


E 7 ” 
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Returning to the hazard rate method, we have 

N 

s = X> 

; = 1 

AsN = min{«: U„ f X (fT" Z, ) /X ) it fol lows that the event that N = n is independent 
of X„+2, ■ ■ ■■ Hence, by Wald’s equation, 

E[S] = E[N]E[Xi] 

= E[N] 

X 

or 

E[N] = A.£[5] 

where E[S] is the mean of the desired random variable. 


11.3 Special Techniques for Simulating Continuous 
Random Variables 


Special techniques have been devised to simulate from most of the common continuous 
distributions. We now present certain of these. 


11.3.1 The Normal Distribution 


Let X and Y denote independent standard normal random variables and thus have the 
joint density function 

fix, y) = — e~( x +y , —oo < x < oo, —oo < y < oo 

2 7T 

Consider now the polar coordinates of the point (X, Y). As shown in Figure 11.1, 

R 2 = X 2 + Y 2 , 

© = tan -1 Y/X 

To obtain the joint density of R 2 and ©, consider the transformation 
d = x 2 + y 2 , 9 = tan -1 y/x 


The Jacobian of this transformation is 


J = 


= 2 


dd 

dd 


2x 

dx 

89 

dy 

89 

= 

1 

f~ y 

dx 

dy 


1 + y 2 /x 

2 \ X 2 


X 

y 


y 

X 

= 2 


x 2 + y 2 

x 2 + y 2 



2y 


1 + y 2 /.x 2 \jr 
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r 2 =x 2 + y 2 

0 = tan- 1 YIX 



Figure 11.1 


Hence, from Section 2.5.3 the joint density of R 2 and © is given by 


f R 2 6) 


—e ' - 
2n 2 

1 _,//? 1 

—e 2 —, 0 < d < oo, 0 < 0 < 2tt 

2 2n 


Thus, we can conclude that R 2 and © are independent with R 2 having an exponential 
distribution with rate \ and © being uniform on (0, 2 jt). 

Let us now go in reverse from the polar to the rectangular coordinates. From the 
preceding if we start with W, an exponential random variable with rate j (W plays 
the role of R 2 ) and with V, independent of W and uniformly distributed over (0, 2jt) 
(V plays the role of ©) then X = \[W cos V, Y = \[VY sin V will be independent 
standard normals. Hence, using the results of Example 11.3 we see that if U\ and U 2 
are independent uniform (0, 1) random numbers, then 

X — (—2 log f/j) 1 / 2 COS(27r[/2), 

1/2 (H-4) 

Y — (—21og sin(27rt/2) 


are independent standard normal random variables. 

Remark The fact that X 2 + Y 2 has an exponential distribution with rate j is quite 
interesting for, by the definition of the chi-square distribution, X 2 + Y 2 has a chi- 
squared distribution with two degrees of freedom. Hence, these two distributions are 
identical. 

The preceding approach to generating standard normal random variables is called 
the Box-Muller approach. Its efficiency suffers somewhat from its need to compute the 
preceding sine and cosine values. There is, however, a way to get around this potentially 
time-consuming difficulty. To begin, note that if U is uniform on (0, 1), then 2 U is 
uniform on (0, 2), and so 2 U — 1 is uniform on (—1, 1). Thus, if we generate random 
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( 1 .- 1 ) 


• = ( 0 , 0 ) 
x = (V' 1 . V 2 ) 


Figure 11.2 

numbers U i and Un and set 
Vi = 2£/i-l, 

V 2 = 21/2-1 

then (Fi, V 2 ) is uniformly distributed in the square of area 4 centered at (0, 0) (see 
Figure 11.2). 

Suppose now that we continually generate such pairs (V), V 2 ) until we obtain one 
that is contained in the circle of radius 1 centered at (0, 0)—that is, until {V \, V 2 ) is such 
that V 2 + V 2 2 ]. It now follows that such a pair ( V\, VV) is uniformly distributed 

in the circle. If we let R , © denote the polar coordinates of this pair, then it is easy to 
verify that R and 0 are independent, with R 1 being uniformly distributed on (0, 1), 
and © uniformly distributed on (0, 2n). 


Since 



sin 0 = V 2 /R = 


cos© = V\/R 


it follows from Equation (11.4) that we can generate independent standard normals X 
and Y by generating another random number U and setting 


X = (—2log U) 1/2 Vi/R, 
Y = (-2log U) 1 ' 2 V 2 /R 


In fact, since (conditional on V 2 + V 2 ^ 1) R 2 is uniform on (0, 1) and is independent 
of 0, we can use it instead of generating a new random number U ; thus showing that 
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are independent standard normals, where 
5 = R 2 = V\ + V 2 

Summing up, we thus have the following approach to generating a pair of indepen¬ 
dent standard normals: 

Step 1: Generate random numbers f/j and \J 2 . 

Step 2: Set V, = 2(7, - 1, V 2 = 2 U 2 - 1, S = V 2 + V 2 2 . 

Step 3: If 5 > 1, return to step 1. 

Step 4: Return the independent unit normals 



The preceding is called the polar method. Since the probability that a random point 
in the square will fall within the circle is equal to n /4 (the area of the circle divided 
by the area of the square), it follows that, on average, the polar method will require 
4/7T = 1.273 iterations of step 1. Hence, it will, on average, require 2.546 random 
numbers, 1 logarithm, 1 square root, 1 division, and 4.546 multiplications to generate 
2 independent standard normals. 

11.3.2 The Gamma Distribution 

To simulate from a gamma distribution with parameters (n, X), where n is an integer, 
we use the fact that the sum of n independent exponential random variables each having 
rate X has this distribution. Hence, if U \,..., U„ are independent uniform (0, 1) random 
variables. 



has the desired distribution. 

When n is large, there are other techniques available that do not require so many 
random numbers. One possibility is to use the rejection procedure with g(x) being 
taken as the density of an exponential random variable with mean n/X (as this is the 
mean of the gamma). It can be shown that for large n the average number of iterations 
needed by the rejection algorithm is e[(n — 1 )/2tt] 1 /2 . In addition, if we wanted to 
generate a series of gammas, then, just as in Example 11.4, we can arrange things so 
that upon acceptance we obtain not only a gamma random variable but also, for free, 
an exponential random variable that can then be used in obtaining the next gamma (see 
Exercise 8). 

11.3.3 The Chi-Squared Distribution 

The chi-squared distribution with n degrees of freedom is the distribution of / 2 = 
Z\ + ■ ■ ■ + Z 2 where Z ,,i = 1,... ,n are independent standard normals. Using 
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the fact noted in the remark at the end of Section 3.1 we see that Z 2 + has an 
exponential distribution with rate ^. Hence, when n is even—say, n = 2k —x?/ has a 
gamma distribution with parameters ( k , j). Hence, —2 log (]~[?=i H,) has a chi-squared 
distribution with 2k degrees of freedom. We can simulate a chi-squared random variable 
with 2k + 1 degrees of freedom by first simulating a standard normal random variable 
Z and then adding Z 2 to the preceding. That is, 

xit+1 =Z 2 -21og (n Ui 

V=1 

where Z,U\,... ,U n are independent with Z being a standard normal and the others 
being uniform (0, 1) random variables. 


11.3.4 The Beta (n, m) Distribution 

The random variable X is said to have a beta distribution with parameters n , m if its 
density is given by 


fix) = 


(n + m — 1)! 

(n - l)!(/n - \)\ X 


(1 -x)'"-', 


0 < x < 1 


One approach to simulating from the preceding distribution is to let U \,..., U n+m -\ 
be independent uniform (0, 1) random variables and consider the nth smallest value of 
this set—call it U( n ). Now U( n ( will equal x if, of the n + m — 1 variables, 

(i) n — 1 are smaller than x, 

(ii) one equals x, 

(iii) in — 1 are greater than x. 

Hence, if the n + m — 1 uniform random variables are partitioned into three subsets 
of sizes n — 1, 1, and in — 1 the probability (density) that each of the variables in the 
first set is less than x, the variable in the second set equals x, and all the variables in 
the third set are greater than x is given by 

(P{U < x}) n - l fu(x)(P{U > x}) m ~ 1 = x ,,_1 (1 — x) m_1 


Hence, as there are (n + m — l)!/(n — 1 )\(m — 1)! possible partitions, it follows that 
U( n ) is beta with parameters (n, m). 

Thus, one way to simulate from the beta distribution is to find the n th smallest of a 
set of n + m — 1 random numbers. However, when n and m are large, this procedure 
is not particularly efficient. 

For another approach consider a Poisson process with rate 1, and recall that given 
S n + m , the time of the (n +m)th event, the set of the first n + m — 1 event times is 
distributed independently and uniformly on (0, S, l+m ). Hence, given S n+m . the nth 
smallest of the first n + m — 1 event times—that is, S n —is distributed as the nth 
smallest of a set of n + m — 1 uniform (0, S„+ m ) random variables. But from the 
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preceding we can thus conclude that S n /S n+m has a beta distribution with parameters 
( n , m). Therefore, if Hi, ..., U n+m are random numbers, 


- log n;' =1 v, 

-io g n ?J?Ui 


is beta with parameters (77 , m ) 


By writing the preceding as 


-iogr iuui 

-logUlUi-iogUHTUi 

we see that it has the same distribution as X/(X + Y) where X and Y are independent 
gamma random variables with respective parameters (n, 1) and (rn. 1). Hence, when 
77 and 777 are large, we can efficiently simulate a beta by first simulating two gamma 
random variables. 


11.3.5 The Exponential Distribution—The Von Neumann Algorithm 

As we have seen, an exponential random variable with rate 1 can be simulated by 
computing the negative of the logarithm of a random number. Most computer programs 
for computing a logarithm, however, involve a power series expansion, and so it might 
be useful to have at hand a second method that is computationally easier. We now 
present such a method due to Von Neumann. 

To begin let Hi, H 2 , ... be independent uniform (0, 1) random variables and define 
N,N^ 2, by 

N = min{77: Hi ^ U 2 ^ ^ H„_i < U n } 

That is, N is the index of the first random number that is greater than its predecessor. 
Let us now compute the joint distribution of N and U\. 

P{N>n,Ui^y}= ( P{N > 77, Hi < y\Ui = x} dx 

Jo 

y 

P{N > n\U\ — x} dx 

Now, given that Hi = x, N will be greater than 77 if x V H 2 Js ■ ■ ■ 5 s H„ or, equivalently, 
if 

(a) H; ^ x, i = 2,... ,n 
and 

(b) H 2 5* • ■ ■ ^ U n 

Now, (a) has probability of occurring and given (a), since all of the (77 — 1)! 
possible rankings of H 2 ,..., H„ are equally likely, (b) has probability 1/(77 — 1)! of 
occurring. Hence, 

P{N > ?7|Hi =x}= X 

(77 - 1 )! 
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and so 


P{N > n, 



which yields 


P{N = n, Ui < y] = P{N > n y] - P{N > n, Ui < y) 


(n — 1)! n\ 

Upon summing over all the even integers, we see that 


P{N is even, Ui ^ y] = y - “ ^jj- 

= \-e~ y 


( 11 . 5 ) 


We are now ready for the following algorithm for generating an exponential random 
variable with rate 1. 

Step 1: Generate uniform random numbers U 1 , Uj, ... stopping at N = min{«: U\ yy 

U n -1 < U„}. 

Step 2: If N is even accept that run, and go to step 3. If N is odd reject the run, and 
return to step 1. 

Step 3: Set X equal to the number of failed runs plus the first random number in the 
successful run. 

To show that X is exponential with rate 1, first note that the probability of a successful 
run is, from Equation (11.5) with y = 1, 

P{N is even} = 1 — e~ l 

Now, in order for X to exceed x, the first [jc] runs must all be unsuccessful and the next 
run must either be unsuccessful or be successful but have TJ\ > x — [x] (where [a] is 
the largest integer not exceeding x). As 

P{N even, U\ > y} = P{N even} — P{N even, U\ ^ y} 


= 1 — e 1 — (1 — e y ) = e y — e 1 


we see that 


P{X > x} = e~ [x] [e~ l + e- ,x - M> - e -1 ] = 


which yields the result. 

Let T denote the number of trials needed to generate a successful run. As each trial is 
a success with probability 1 — e~ 1 it follows that T is geometric with mean 1/(1 — e -1 ). 
If we let Nj denote the number of uniform random variables used on the i th run, i ^ 1, 
then T (being the first run i for which A, is even) is a stopping time for this sequence. 
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Hence, by Wald’s equation, the mean number of uniform random variables needed by 
this algorithm is given by 


s> 

Li=l J 


= E[N]E[T] 


Now, 


OO 

£[A] = > n} 

n =0 

oo 

= l + £>{t/i 

n =1 
oo 

= 1 + \/n\ = e 

71=1 


and so 


E 




Li'=l 



4.3 


Hence, this algorithm, which computationally speaking is quite easy to perform, requires 
on the average about 4.3 random numbers to execute. 


11.4 Simulating from Discrete Distributions 

All of the general methods for simulating from continuous distributions have analogs 
in the discrete case. For instance, if we want to simulate a random variable X having 
probability mass function 

P{X = Xj} = Pj, j = 1,2,..., J2 P j= 1 

j 

we can use the following discrete time analog of the inverse transform technique: 

To simulate X for which P{X = xj) = Pj 

let U be uniformly distributed over (0, 1), and set 

xi, if U < Pi 
x 2 , if Pi < U < Pi + P 2 

j-1 j 

xj, ifr, Pj<u<y,Pi 

1 i 


X = 
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As, 


P{X = xj] = P 


E p i< u <E p ‘ 


i i j 

we see that X has the desired distribution. 

Example 11.7 (The Geometric Distribution) 

that 


= P, 


Suppose we want to simulate X such 


P{X = i} = p(l-p) i -\ i>l 


As 

j-1 

p{x = i] = i - p{x > j - 1 } = i - (i - P y~' 

i=i 

we can simulate such a random variable by generating a random number U and then 
setting X equal to that value j for which 

i — (i — P y~ l < u < i — (i — P y 


or, equivalently, for which 

(i - P y < 1 - U < (1 - P ) j 


As 1 — U has the same distribution as U, we can thus define X by 


X — min{y: (1 — p) J < U] = min 


j- j > 


log U | 

log(l - p)\ 


= 1 + 


log u 

log(l - p ) 


As in the continuous case, special simulation techniques have been developed for the 
more common discrete distributions. We now present certain of these. 


Example 11.8 (Simulating a Binomial Random Variable) A binomial (n, p) ran¬ 
dom variable can be most easily simulated by recalling that it can be expressed as the 
sum of n independent Bernoulli random variables. That is, if U \,..., U n are indepen¬ 
dent uniform (0, 1) variables, then letting 


1 , if Ui < P 
0 , otherwise 


it follows that X = ^21= l X,■ is a binomial random variable with parameters n and p. 

One difficulty with this procedure is that it requires the generation of n random 
numbers. To show how to reduce the number of random numbers needed, note first that 
this procedure does not use the actual value of a random number U but only whether or 
not it exceeds p. Using this and the result that the conditional distribution of U given 
that U < p is uniform on (0, p) and the conditional distribution of U given that U > p 
is uniform on ( p , 1), we now show how we can simulate a binomial (n, p) random 
variable using only a single random number: 
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Step 1: Let a = \/p, = 1/(1 — p). 

Step 2: Set k = 0. 

Step 3: Generate a uniform random number U. 

Step 4: If k — n stop. Otherwise reset k to equal k + 1 • 

Step 5: If U ^ p set Xk = 1 and reset U to equal aU. If U > p set Xk = 0 and reset 
U to equal /HU — p). Return to step 4. 

This procedure generates X\, ..., X n and X — ^" =1 X, is the desired random 
variable. It works by noting whether 14 ^ p or £4 > p\ in the former case it takes 
£ 4+1 to equal Uk/p, and in the latter case it takes £4+i to equal {Uk — /?)/(l — p).' 

■ 

Example 11.9 (Simulating a Poisson Random Variable) To simulate a Poisson 
random variable with mean X, generate independent uniform (0, 1) random variables 
U\, £4, stopping at 


N + 1 = min I n: ]""[ £/, < e x 
l i=l 

The random variable N has the desired distribution, which can be seen by noting that 

1 " 

N = max | n: — log £/, < X 

[ i=l 

But — log Ui is exponential with rate 1, and so if we interpret — log £/,, i 4 1, as the 
interarrival times of a Poisson process having rate 1, we see that N = N(X) would 
equal the number of events by time X. Hence N is Poisson with mean X. 

When X is large we can reduce the amount of computation in the preceding simulation 
of N{X), the number of events by time /. of a Poisson process having rate 1, by first 
choosing an integer m and simulating S m , the time of the mth event of the Poisson 
process, and then simulating N(X) according to the conditional distribution of N (),) 
given S m . Now the conditional distribution of N(X) given S m is as follows: 


N{X)\S m — s ~ m + Poisson(k — s), if s < X 


N(X)\S m = s 


Binomial 




, if 5 > X 


where ~ means “has the distribution of.” This follows since if the mth event occurs 
at time s, where s < X, then the number of events by time X is m plus the number of 
events in ( s , X). On the other hand given that S m — s the set of times at which the first 
m — 1 events occur has the same distribution as a set of in — 1 uniform (0, .v ) random 
variables (see Section 5.3.5). Hence, when X < s, the number of these that occur by 
time X is binomial with parameters m — 1 and X/s. Hence, we can simulate N(X) by 
first simulating S m and then simulating, either P(X — S m ) , a Poisson random variable 


I Because of computer round-off errors, a single random number should not be continuously used when n 
is large. 
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with mean X — S m , when S, n < X, or simulating Bin(m — 1, X/S m ), a binomial random 
variable with parameters m — 1 and X/S m , when S m > a; and then setting 


N(X) = 


m + P(X — S m ), if S m < X 
Bin(;n — 1, X/S m ), if S m > X 


In the preceding it has been found computationally effective to let tn be approximately 
Of course, S m is simulated by simulating from a gamma (in, X) distribution via an 
approach that is computationally fast when m is large (see Section 1 1.3.3). ■ 

There are also rejection and hazard rate methods for discrete distributions but we 
leave their development as exercises. However, there is a technique available for simu¬ 
lating finite discrete random variables—called the alias method —which, though requir¬ 
ing some setup time, is very fast to implement. 


11.4.1 The Alias Method 

In what follows, the quantities P, P®,Q®,k ^ n — 1 will represent probability mass 
functions on the integers 1 , 2 ,..., n — that is, they will be n-vectors of nonnegative 
numbers summing to 1. In addition, the vector P® will have at most k nonzero compo¬ 
nents, and each of the Q® will have at most two nonzero components. We show that 
any probability mass function P can be represented as an equally weighted mixture of 
n — 1 probability mass functions Q (each having at most two nonzero components). 
That is, we show that for suitably defined Q®, ..., P can be expressed as 


P = 



n— 1 


£q ( ‘> 


( 11 . 6 ) 


As a prelude to presenting the method for obtaining this representation, we will need 
the following simple lemma whose proof is left as an exercise. 

Lemma 11.5 Let P ={/*,■, i = 1. n} denote a probability mass function, then 

(a) there exists an i, 1 ^ i ^ n, such that I) < 1 /(n — 1), and 

(b) for this i, there exists a j, j ^ i, such that /) + Pj p 1 /(« - 1). 

Before presenting the general technique for obtaining the representation of Equation 
(11.6), let us illustrate it by an example. 

Example 11.10 Consider the three-point distribution P with Pi = , P2 = \ , P3 = 

®. We start by choosing i and j such that they satisfy the conditions of Lemma 11.5. 
As P 3 < \ and P3 + P2 > 5 , we can work with i = 3 and j = 2. We will now define 
a two-point mass function Q ( 11 putting all of its weight on 3 and 2 and such that P will 
be expressible as an equally weighted mixture between Q (l) and a second two-point 
mass function Q®. Secondly, all of the mass of point 3 will be contained in Q‘ 1 As 
we will have 

Pj = \{Qf + Qf), j = 1 , 2,3 


(11.7) 
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(2) 

and, by the preceding, Q 3 is supposed to equal 0 , we must therefore take 

Q* ) = 2P * = \’ Q2 ) = '-Q? = \’ Q\ V) = 0 

To satisfy Equation (11.7), we must then set 

Qf = 0, of =2 Pi- 1 -= X -, Q\ 2) =2P\ = \ 

Hence, we have the desired representation in this case. Suppose now that the original 
distribution was the following four-point mass function: 


P\ 




1 

8 ’ 


P 4 = 


3 

16 


Now, Pt, < 1 and P 3 + l’\ > J,. Hence our initial two-point mass function——will 
concentrate on points 3 and 1 (giving no weights to 2 and 4). As the final representation 
will give weight j to Q (l) and in addition the other Q (/ \ j = 2, 3, will not give any 
mass to the value 3, we must have 


1 

3 



Hence, 


Pi 


1 

8 



Q\ 


a) 


Also, we can write 


3 

8 


5 

8 


P = IqCD + -p( 3 ) 
3 3 


where P (3 \ to satisfy the preceding, must be the vector 


P 


(3) 

1 


P 


(3) 

2 



P® = 0, 

r ? = h 


9 

32 


1 1 

3 2’ 


Note that P <3) gives no mass to the value 3. We can now express the mass function P (3) 
as an equally weighted mixture of two-point mass functions Q' 2 ' and Q <3) , and we will 
end up with 


P = Iq(D + - 
3 3 


1 


»(2) 


1 


»C3) 


,2 2 

Q (2) + Q (3) ) 

(We leave it as an exercise for you to fill in the details.) 


= I(Q<‘> 
3 
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The preceding example outlines the following general procedure for writing the 
«-point mass function P in the form of Equation (11.6) where each of the Q ,M are 
mass functions giving all their mass to at most two points. To start, we choose i and 
j satisfying the conditions of Lemma 11.5. We now define the mass function Q (1) 

concentrating on the points i and j and which will contain all of the mass for point i by 

(k) 

noting that, in the representation of Equation (11.6), O' — 0 for k = 2,.... n — \. 
implying that 

= (n-\)Pi, and so Q ( j } = 1 - (n - \)P, 

Writing 

P = —Q (1) + "—Apin-i) (11.8) 

71—1 11 — 1 

where P (,!_1) represents the remaining mass, we see that 
pin- 1) 

pin-1) 

pin- 1) 
r k 

That the foregoing is indeed a probability mass function is easily checked—for instance, 
the nonnegativity of pj" 11 follows from the fact that j was chosen so that P, + Pj ^ 
1 /{n - 1). 

We may now repeat the foregoing procedure on the (n — l)-point probability mass 
function P (n_1) to obtain 

p(H-l) = 1 Q(2) , n 3 p(»—2) 

71-2 71- 2 

and thus from Equation (11.8) we have 

P = 1 q(D + 1 q(2) + n ~ 3 pin-2) 

Ti—l n — 1 n — 1 

We now repeat the procedure on p (,,_2) and so on until we finally obtain 

P= — 1 — (Q <1> + --- + Q < "~ 1) ) 

n — 1 

In this way we are able to represent P as an equally weighted mixture of n — 1 two- 
point mass functions. We can now easily simulate from P by first generating a random 
integer N equally likely to be either 1.2,..., or /; — 1. If the resulting value N is 
such that Q iN) puts positive weight only on the points i ,v and /,v, then we can set X 
equal to if a second random number is less than and equal to /,y otherwise. 
The random variable X will have probability mass function P. That is, we have the 
following procedure for simulating from P: 


0, 
n - 


1 


n — 2 

72 — 1 


^T e ? ) J = 


72 — 1 

^ Pi + P i 


n — 1 


-P k , k ^ i or j 
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Generate U\ and set N = 1 + [(« — 1 )U\ ]. 
Generate U 2 and set 
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Step 1: 
Step 2: 


X = 


| i N , if U 2 < Q 
I jff, otherwise 


(N) 


Remarks 

(i) The preceding is called the alias method because by a renumbering of the Qs we 

(k) 

can always arrange things so that for each k, Q k > 0. (That is, we can arrange 
things so that the fcth two-point mass function gives positive weight to the value k.) 
Hence, the procedure calls for simulating N, equally likely to be 1, 2, ..., or n — 1, 
and then if N — k it either accepts k as the value of X, or it accepts for the value 
of X the “alias” of k (namely, the other value that Q (i ) gives positive weight). 

(ii) Actually, it is not necessary to generate a new random number in step 2. Because 
N — 1 is the integer part of {n — 1) U\ , it follows that the remainder (n— l)Ui — (N— 1) 
is independent of U\ and is uniformly distributed in (0, 1). Hence, rather than 
generating a new random number U 2 in step 2, we can use (n — l)t/i — (N — 1) = 

(n - Wl ~ [(« - l)t/i]. 

Example 11.11 Let us return to the problem of Example 11.1, which considers a list 
of n , not necessarily distinct, items. Each item has a value— vU ) being the value of the 
item in position i —and we are interested in estimating 

n 

v = ^ v(i)/m(i) 

1 = 1 

where m(i) is the number of times the item in position i appears on the list. In words, 
v is the sum of the values of the (distinct) items on the list. 

To estimate v, note that if A is a random variable such that 

n 

P{X = i} = v(i) I ^2 v(j), i = 
l 


then 


E[l/m(X)] = 


v(i)/m(i) 

J2j v(j) 


ri 


7 = 1 


Hence, we can estimate i> by using the alias (or any other) method to generate inde¬ 
pendent random variables X\, X^ having the same distribution as X and then 
estimating v by 




7=1 i=1 


V 
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11.5 Stochastic Processes 

We can easily simulate a stochastic process by simulating a sequence of random vari¬ 
ables. For instance, to simulate the first t time units of a renewal process having interar¬ 
rival distribution F we can simulate independent random variables X\. Xj, ■ ■ ■ having 
distribution F, stopping at 

N = min [n: X\ + ■ ■ ■ + X n > t) 

The Xj,i ^ 1, represent the interarrival times of the renewal process and so the 
preceding simulation yields N — 1 events by time t —the events occurring at times 
X u Xi +X 2 , +--- + X N -i. 

Actually there is another approach for simulating a Poisson process that is quite 
efficient. Suppose we want to simulate the first t time units of a Poisson process having 
rate X. To do so, we can first simulate N(t), the number of events by f, and then use the 
result that given the value of N(t), the set of N(t) event times is distributed as a set of 
n independent uniform (0, t ) random variables. Hence, we start by simulating N(t), a 
Poisson random variable with mean Xt (by one of the methods given in Example 11 .9). 
Then, if N(t) = n, generate a new set of n random numbers—call them U\ ,..., U n — 
and {tU\, ..., tU„] will represent the set of N(t) event times. If we could stop here this 
would be much more efficient than simulating the exponentially distributed interarrival 
times. However, we usually desire the event times in increasing order—for instance, 
for s < t, 

N(s) = number of Uj : tU, ^ s 

and so to compute the function N(s), s ^ t, it is best to first order the values i = 
\, ... ,n before multiplying by t. However, in doing so you should not use an all¬ 
purpose sorting algorithm, such as quick sort (see Example 3.14), but rather one that 
takes into account that the elements to be sorted come from a uniform (0, 1) population. 
Such a sorting algorithm of n uniform (0, 1) variables is as follows: Rather than a single 
list to be sorted of length n we will consider n ordered, or linked, lists of random size. 
The value U will be put in list i if its value is between (i — 1 )/n and i /n —that is, U is 
put in list [nU] + 1 • The individual lists are then ordered, and the total linkage of all the 
lists is the desired ordering. As almost all of the n lists will be of relatively small size 
(for instance, if n = 1000 the mean number of lists of size greater than 4 is (using the 
Poisson approximation to the binomial) approximately equal to 1000(1 — ^e -1 ) — 4) 
the sorting of individual lists will be quite quick, and so the running time of such an 
algorithm will be proportional to n (rather than to n log n as in the best all-purpose 
sorting algorithms). 

An extremely important counting process for modeling purposes is the nonhomo- 
geneous Poisson process, which relaxes the Poisson process assumption of stationary 
increments. Thus it allows for the possibility that the arrival rate need not be con¬ 
stant but can vary with time. However, there are few analytical studies that assume a 
nonhomogeneous Poisson arrival process for the simple reason that such models are 
not usually mathematically tractable. (For example, there is no known expression for 
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the average customer delay in the single-server exponential service distribution queue¬ 
ing model that assumes a nonhomogeneous arrival process.)^ Clearly such models are 
strong candidates for simulation studies. 

11.5.1 Simulating a Nonhomogeneous Poisson Process 

We now present three methods for simulating a nonhomogeneous Poisson process 
having intensity function X(t), 0 ^ t < oo. 

Method 1. Sampling a Poisson Process 

To simulate the first T time units of a nonhomogeneous Poisson process with intensity 
function 1(f), let X be such that 

X(t) ^ X for all t ^ T 

Now, as shown in Chapter 5, such a nonhomogeneous Poisson process can be generated 
by a random selection of the event times of a Poisson process having rate X. That is, if 
an event of a Poisson process with rate X that occurs at time t is counted (independently 
of what has transpired previously) with probability X{t)/X then the process of counted 
events is a nonhomogeneous Poisson process with intensity function X(t) , 0 ^ t ^ T. 
Hence, by simulating a Poisson process and then randomly counting its events, we can 
generate the desired nonhomogeneous Poisson process. We thus have the following 
procedure: 

Generate independent random variables Xi , U\ , X 2 , C/2, ■ ■ • where the X, are expo¬ 
nential with rate X and the U-, are random numbers, stopping at 

N = min j n : Xj > T 

l i=l 

Now let, for j = 1, ..., N — 1 
1, if Uj^X^Y. 

0, otherwise 

and set 

J = U- Ij = 1 } 

Thus, the counting process having events at the set of times {^/ =1 Xp. j e /} consti¬ 
tutes the desired process. 

The foregoing procedure, referred to as the thinning algorithm (because it “thins” 
the homogeneous Poisson points) will clearly be most efficient, in the sense of having 
the fewest number of rejected event times, when X(t) is near X throughout the interval. 

One queueing model that assumes a nonhomogeneous Poisson arrival process and is mathematically 
tractable is the infinite server model. 
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Thus, an obvious improvement is to break up the interval into subintervals and then 
use the procedure over each subinterval. That is, determine appropriate values k.O < 
ft < t 2 < • • ■ < tk < T, A.i,..., A.fc+i, such that 

AG) ^ A; when ti- 1 ^s<tj,i— (where to — 0, 4+1 = T) 

(11.9) 

Now simulate the nonhomogeneous Poisson process over the interval (f;_i,f,) by 
generating exponential random variables with rateA, and accepting the generated event 
occurring at time s, s e (f,-_ i, tj), with probability/.(.v)//,,. Becauseofthe memoryless 
property of the exponential and the fact that the rate of an exponential can be changed 
upon multiplication by a constant, it follows that there is no loss of efficiency in going 
from one subinterval to the next. In other words, if we are at t e [ti- j, t{) and generate 
X, an exponential with rate A,, which is such that t+X > ti then we can use A; [X — (t, — 
t)]/A, + i as the next exponential with rate A, + j. Thus, we have the following algorithm 
for generating the first t time units of a nonhomogeneous Poisson process with intensity 
function A(.v) when the relations (11.9) are satisfied. In the algorithm, t will represent 
the present time and I the present interval (that is, I = i when f,_i ^ t < ti). 

Step 1: t = 0 ,1 = 1. 

Step 2: Generate an exponential random variable X having rate A /. 

Step 3: If t + X < tj, reset t = t + X, generate a random number U, and accept the 
event time t if U ^ k(t)/kj. Return to step 2. 

Step 4: (Step reached if t + X ^ tj). Stop if I = k + 1. Otherwise, reset X = 
(X — tj + t)ki /A/+i. Also reset t = ti and 1 = 1 + 1 , and go to step 3. 

Suppose now that over some subinterval (t,_i, f, ) it follows that A f > 0 where 
kj = infimum (AG): f,-_i ^ s < ti} 

In such a situation, we should not use the thinning algorithm directly but rather should 
first simulate a Poisson process with rate A ; - over the desired interval and then simulate 
a nonhomogeneous Poisson process with the intensity function A(,v) = A(s) — A, when 
s e (ti- 1 , tj). (The final exponential generated for the Poisson process, which carries 
one beyond the desired boundary, need not be wasted but can be suitably transformed 
so as to be reusable.) The superposition (or, merging) of the two processes yields the 
desired process over the interval. The reason for doing it this way is that it saves the 
need to generate uniform random variables for a Poisson distributed number, with mean 
A ; (ti — tj- 1 ) of the event times. For instance, consider the case where 

k(s) — 10 + s, 0 < s < 1 

Using the thinning method with A = 11 would generate an expected number of 11 
events each of which would require a random number to determine whether or not to 
accept it. On the other hand, to generate a Poisson process with rate 10 and then merge 
it with a generated nonhomogeneous Poisson process with rate k(s) = s, 0 < s < 1, 
would yield an equally distributed number of event times but with the expected number 
needing to be checked to determine acceptance being equal to 1 . 
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Figure 11.3 


Another way to make the simulation of nonhomogeneous Poisson processes more 
efficient is to make use of superpositions. For instance, consider the process where 


mo = 


expjr 2 }, 0 < 1 < 1.5 

exp{2.25}, 1.5 < t < 2.5 

exp{(4 — r) 2 }, 2.5 <t <4 


A plot of this intensity function is given in Figure 11.3. One way of simulating this 
process up to time 4 is to first generate a Poisson process with rate 1 over this interval; 
then generate a Poisson process with rate e — 1 over this interval, accept all events in 
(1,3), and only accept an event at time t that is not contained in (1, 3) with probability 
[k(r) — l]/(e — 1); then generate a Poisson process with rate e 2 25 — e over the interval 
(1, 3), accepting all event times between 1.5 and 2.5 and any event time t outside this 
interval with probability [A(t) — e]/(e 2 ' 25 — e). The superposition of these processes 
is the desired nonhomogeneous Poisson process. In other words, what we have done is 
to break up 7,(f) into the following nonnegative parts: 


k(f) — A.i(r) + k2(t) + k3(r), 0 < t < 4 


where 


M(0 s 1, 


M(0 


k3(f) 


MO - l. 

0 < t < 1 

e — 1, 

1 < t < 3 

MO-l, 

3 < t < 4 

X(t) — e, 

1 < t < 1.5 

e 225 - e, 

1.5 < t < 2.5 

k(t) — e, 

2.5 < t < 3 

0, 

3 < t < 4 
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and where the thinning algorithm (with a single interval in each case) was used to 
simulate the constituent nonhomogeneous processes. 

Method 2. Conditional Distribution of the Arrival Times 

Recall the result for a Poisson process having rate A that given the number of events by 
time T the set of event times are independent and identically distributed uniform (0, T) 
random variables. Now suppose that each of these events is independently counted with 
a probability that is equal to k{t)/X when the event occurred at time t. Hence, given 
the number of counted events, it follows that the set of times of these counted events 
are independent with a common distribution given by F(s), where 

F(s ) = Pjtime ^ .v [counted} 

Pftime ^ s, counted} 

P {counted} 

_ fo p f time ^ s , counted | time — x } clx JT 
Pfcounted} 

/q A(x) dx 
Jq A(x) dx 

The preceding (somewhat heuristic) argument thus shows that given n events of a 
nonhomogeneous Poisson process by time T the n event times are independent with a 
common density function 

k(s) r T 

0 <s <T, m(T)= X(s)ds (11.10) 

m(T) J 0 

Since N(T), the number of events by time T, is Poisson distributed with mean m(T), 
we can simulate the nonhomogeneous Poisson process by first simulating N(T) and 
then simulating N(T) random variables from the density function of (1 1.10). 

Example 11.12 If 7.(.v) = cs, then we can simulate the first T time units of the 
nonhomogeneous Poisson process by first simulating N(T), a Poisson random variable 
having mean m{T) = fl cs ds = CT 1 /2, and then simulating N (T) random variables 
having distribution 

s 2 

F{s) = 0 <s<T 

Random variables having the preceding distribution either can be simulated by use 
of the inverse transform method (since F~fU) = T\fU) or by noting that F is the 
distribution function of max(7T/|, TUT) when U\ and Uj are independent random 
numbers. ■ 

If the distribution function specified by Equation (11.10) is not easily invertible, 
we can always simulate from (1 1.10) by using the rejection method where we either 
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accept or reject simulated values of uniform (0, T ) random variables. That is, let h (s) = 
l/7\0 < j < T. Then 

f(s) = TXis) < ^ 

/i(i) m(T) ' m(T) 

where k is a bound on A.(s), 0 ,y T. Hence, the rejection method is to generate 
random numbers U i and U 2 then accept TU 1 if 


U 2 < 


f{TUi) 

Ch(TU\) 


or, equivalently, if 


U 2 < 


w 

1 


Method 3. Simulating the Event Times 

The third method we shall present for simulating a nonhomogeneous Poisson process 
having intensity function k(t), t ^ 0 is probably the most basic approach—namely, 
to simulate the successive event times. So let X \, X 2 , ... denote the event times of 
such a process. As these random variables are dependent we will use the conditional 
distribution approach to simulation. Hence, we need the conditional distribution of X, 
given X\, 1 . 

To start, note that if an event occurs at time x then, independent of what has occurred 
prior to x, the time until the next event has the distribution F x given by 


F x (t) = P{0 events in (x, x + f)|event at x} 

= P{0 events in (x, x + t)\ by independent increments 


= ' 4 /„' 


k(x + y)dy j 

Differentiation yields that the density corresponding to F x is 


fx(t) = k(x + t) exp 


/' 


X(x + y) dy 


implying that the hazard rate function of F x is 
f x (t) 

r x (t ) = = k(x + t) 

F x (t) 


We can now simulate the event times X\, X 2 ,... by simulating X 1 from f ’q ; then if 
the simulated value of Xi is x \, simulate X 2 by adding x 1 to a value generated from Fx 1 , 
and if this sum is x 2 simulate X 2 by adding x 2 to a value generated from Fx 2 , and so on. 
The method used to simulate from these distributions should depend, of course, on the 
form of these distributions. However, it is interesting to note that if we let k be such that 
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X(t) ^ X and use the hazard rate method to simulate, then we end up with the approach 
of Method 1 (we leave the verification of this fact as an exercise). Sometimes, however, 
the distributions F x can be easily inverted and so the inverse transform method can be 
applied. 

Example 11.13 Suppose that X(x) — 1 /(x + a), x Js 0. Then 


f 


X(x + y) dy = log 


x + a + t 
x + a 


Hence, 


FAt) = 1 - 


x + a 
x T & “t~ t 


t 

x T rz T t 


and so 

F~ l (u ) = C* +«) T~ 
1 - u 


We can, therefore, simulate the successive event times X\,X 2 ,... by generating 
C7 1 , C/ 2 , ... and then setting 


Xi 


aU\ 

1 -UA 


X 2 = (Xi + a) 


Ui 

1 - u 2 


+ Xi 


and, in general. 


Xj = (Xj-i + + Xj-i, jA 2 ■ 

11.5.2 Simulating a Two-Dimensional Poisson Process 

A point process consisting of randomly occurring points in the plane is said to be a 
two-dimensional Poisson process having rate X if 

(a) the number of points in any given region of area A is Poisson distributed with mean 
AA; and 

(b) the numbers of points in disjoint regions are independent. 

For a given fixed point O in the plane, we now show how to simulate events occurring 
according to a two-dimensional Poisson process with rate X in a circular region of radius 
r centered about O. Let K/. i ^ 1, denote the distance between O and its ith nearest 
Poisson point, and let C(a) denote the circle of radius a centered at O. Then 

Z 3 17T > bj = P j Ri > yj J = pjno points in C^y/b/n) J = e~ xb 
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Also, with C(« 2 ) — C(fli) denoting the region between CUin) and C,(a \): 


P\nRl - tzR\ > b\Ri = r 


= P 


= P j > ^{b + nr 2 )/TT\R\ = r 
no points in C (b + Tcr 2 )/n 
no points in c( J(b + jtr 2 )/7t 


= P 


—Xb 


C(r)|/?i =rj 
C(r)l by (b) 


In fact, the same argument can be repeated to obtain the following. 

Proposition 11.6 With Rq = 0, 

7T Rf — 7t R~_ { , i ^ 1, 


are independent exponentials with rate X. 

In other words, the amount of area that needs to be traversed to encompass a Poisson 
point is exponential with rate X. Since, by symmetry, the respective angles of the 
Poisson points are independent and uniformly distributed over (0, 2n), we thus have 
the following algorithm for simulating the Poisson process over a circular region of 
radius r about O: 


Step 1: Generate independent exponentials with rate 1, X\, Xi, ... , stopping at 
X\ + ■ 


N = min 


n: 


Xn 


■X„ 2 

- > r 


Step 2: If N = l, stop. There are no points in C(r). Otherwise, for i = 1,..., N — 1, 
set 

Ri = V( Xi+--- + Xi)/Xjt 

Step 3: Generate independent uniform (0, 1) random variables U i, ..., Un-\. 

Step 4: Return the N — 1 Poisson points in C (r ) whose polar coordinates are 

( Ri,2nUi ), i = 1. N- 1 


The preceding algorithm requires, on average, l + Xicr 2 exponentials and an equal 
number of uniform random numbers. Another approach to simulating points in C ( r ) is 
to first simulate N, the number of such points, and then use the fact that, given N, the 
points are uniformly distributed in C(r). This latter procedure requires the simulation 
of A, a Poisson random variable with mean Xi r/- 2 ; we must then simulate N uniform 
points on C (r), by simulating R from the distribution /-’/,> (a) = a 2 /r 2 (see Exercise 25) 
and 0 from uniform (0, 2n) and must then sort these N uniform values in increasing 
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Figure 11.4 


order of R. The main advantage of the first procedure is that it eliminates the need to 
sort. 

The preceding algorithm can be thought of as the fanning out of a circle centered 
at O with a radius that expands continuously from 0 to r. The successive radii at 
which Poisson points are encountered is simulated by noting that the additional area 
necessary to encompass a Poisson point is always, independent of the past, exponential 
with rate X. This technique can be used to simulate the process over noncircular regions. 
For instance, consider a nonnegative function g(x), and suppose we are interested in 
simulating the Poisson process in the region between the x-axis and g with x going 
from 0 to T (see Figure 11.4). To do so we can start at the left-hand end and fan 
vertically to the right by considering the successive areas g(x)dx. Now if X\ < 
X 2 < ■ ■ ■ denote the successive projections of the Poisson points on the x-axis, then 
analogous to Proposition 11.6, it will follow that (with Xo = 0) X | g(x) dx, i ^ 1, 
will be independent exponentials with rate 1. Hence, we should simulate ei, 62 ,... , 
independent exponentials with rate 1, stopping at 


N = min 


n : ei H-h e n > X 




and determine Xj, ..., X^-i by 

rX 1 

X / g(x) dx — e 1, 

Jo 

rX 2 

X / g(x)dx = e 2 , 

Jx 1 


[Xn- 1 

X g(x)dx = e N _ 1 

JX N - 2 

If we now simulate U \,..., Un- 1—independent uniform (0, 1) random numbers— 
then as the projection on the y-axis of the Poisson point whose x-coordinate is X, is 
uniform on (0, g(X,)), it follows that the simulated Poisson points in the interval are 
(Xi.UigiXi)),: = 








680 


Introduction to Probability Models 


Of course, the preceding technique is most useful when g is regular enough so that 
the foregoing equations can be solved for the X,. For instance, if g(x) = y (and so the 
region of interest is a rectangle), then 


X: = 


Cl 


Ay 


i = - 1 


and the Poisson points are 

(. Xi,yUi ), i = 1. N- 1 


11.6 Variance Reduction Techniques 


Let X\,..., X n have a given joint distribution, and suppose we are interested in com¬ 
puting 

e = E[ g (x i,...,x„)] 

where g is some specified function. It is often the case that it is not possible to analyti¬ 
cally compute the preceding, and when such is the case we can attempt to use simulation 
to estimate 6. This is done as follows: Generate \ ..., xl l) having the same joint 
distribution as X \, ..., X n and set 

1 ". = «(*!'’. 4 ‘>) 

Now, simulate a second set of random variables (independent of the first set) 

( 2 ) ( 2 ) 

A| , ..., X n having the distribution of X[, ..., X n and set 

Y 2 = g(x?\...,xW) 

Continue this until you have generated k (some predetermined number) sets, and so 
have also computed Y \, Y 2 ,■.., Yk- Now, Y\..... Lt are independent and identically 
distributed random variables each having the same distribution of g(Xi,..., X n ). Thus, 
if we let Y denote the average of these k random variables—that is, 

k 

f = J2 Yi/k 

i= 1 

then 


E[Y] = 6, 


E 



= Var(T) 


Hence, we can use Y as an estimate of 6. As the expected square of the difference 
between Y and 0 is equal to the variance of Y, we would like this quantity to be as 
small as possible. In the preceding situation, Var(T) = Var(F;)/A:, which is usually not 
known in advance but must be estimated from the generated values Y\,... ,Y n . We 
now present three general techniques for reducing the variance of our estimator. 
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11.6.1 Use of Antithetic Variables 

In the preceding situation, suppose that we have generated Y\ and Y 2 , identically dis¬ 
tributed random variables having mean 6. Now, 

Var ( Y ' 2 Y2 ) = ^ Var ( y i) + Var ( y 2) + 2Cov(h 1 , Y 2 )] 

_ Var(Fi) Cov(Fi, Y 2 ) 

~ 2 1 2 

Hence, it would be advantageous (in the sense that the variance would be reduced) if 
Y i and Y 2 rather than being independent were negatively correlated. To see how we 
could arrange this, let us suppose that the random variables X i, ..., X„ are independent 
and, in addition, that each is simulated via the inverse transform technique. That is, A", 
is simulated from F~ l (Ui) where Ui is a random number and F, is the distribution 
of Xj. Hence, Y\ can be expressed as 

Yi =g(Ff\u 1 ),...,F~ 1 (U n )) 

Now, since 1 — t/ is also uniform over (0, 1) whenever U is a random number (and is 
negatively correlated with U ) it follows that Y 2 defined by 

*2 = - £/„>) 

will have the same distribution as Y\. Hence, if Y\ and Y 2 were negatively correlated, 
then generating Y 2 by this means would lead to a smaller variance than if it were 
generated by a new set of random numbers. (In addition, there is a computational 
savings since rather than having to generate n additional random numbers, we need 
only subtract each of the previous n from 1.) The following theorem will be the key to 
showing that this technique—known as the use of antithetic variables —will lead to a 
reduction in variance whenever g is a monotone function. 

Theorem 11.1 If X\,... , X n are independent, then, for any increasing functions / 
and g of n variables, 

E[f (X)g(X)] > E[f(X)]E[g(X)] (11.11) 

where X = (AT, ..., X n ). 

Proof. The proof is by induction on n. To prove it when n = 1, let / and g be 
increasing functions of a single variable. Then, for any x and y, 

iflx) - f(y))(g(x) - g(y )) > 0 

since if x ^ y (x ^ y) then both factors are nonnegative (nonpositive). Hence, for any 
random variables X and T, 


(f(X)-f(Y))(g(X)-g(Y))2 0 
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implying that 

E[(f{X) - f{Y))(g{X) - g(Y))]> 0 

or, equivalently, 

E[f(X)g(X)] + E[f(Y)g(Y)] > E[f(X)g(Y)] + E[f(Y)g(X)] 

If we suppose that X and Y are independent and identically distributed, as in this case, 
then 


E[f(X)g(X)] = E[f(Y)g(Y)], 

E[f(X)g(Y )] = E[f(Y)g(X)i = E[f(X)]E[g(X)] 

and so we obtain the result when n = 1. 

So assume that (11.11) holds for n— 1 variables, and now suppose that X \...., X„ 
are independent and / and g are increasing functions. Then 

E[f(X)g(X)\X n =x n ] 

= E[f{X { , ..., X„_!, x n )g{X j,..., X n - U x n )\X n = x ] 

= E[f(X i-- X n -i, x„)g(X \,..., X n -i,x n )] by independence 

^ E[f(X i, ..., X n -i,x n )]E[g(X i, ..., Xn-ux,,)] 
by the induction hypothesis 
= E[f(X)\X„ = Xn\E[g{X)\X n = x n ] 


Hence, 

E[f(X)g(X)\X n ] > E[f(X)\X n ]E[g(X)\X n ] 

and, upon taking expectations of both sides, 

£[/(X)g(X)] > E[E[f(X)\X n ]E[g(X)\X n ]] 

^E[f(X)]E[g(X)] 

The last inequality follows because E[f (X)\X n \ and /f[,y(X) | X n \ are both increasing 
functions of X n , and so, by the result for n = 1, 

£[£[/(X)|X„]£[g(X)|X„]] > E[E[f(X)\X n ]]E[E[g(X)\X n )] 

= £[/(X)]£[g(X)] ■ 

Corollary 11.7 If U\, ..., U n are independent, and k is either an increasing or 
decreasing function, then 

Cov(k(U\ *(1 - £/i,.... 1 - U n j) ^ 0 
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Proof. Suppose k is increasing. As —k(\ — 1 — U„) is increasing in U\, ..., 

U„ , then, from Theorem 11.1, 

Cov(k(U u .... U n ), -k{ 1 - Ui,..., 1 - U n )) > 0 


When k is decreasing just replace k by its negative. ■ 

Since F- (Ui) is increasing in (/, (as k ), being a distribution function, is increasing) 
it follows that g(F^~ l (U \),..., F~ 1 (U n )) is a monotone function of U\, ..., U„ when¬ 
ever g is monotone. Hence, if g is monotone the antithetic variable approach of twice 
using each set of random numbers Ui,... ,U n by first computing gik) '(T'i), ..., 
F~ l (U n )) and then g(/*j _1 (l — U\), ..., F~ l ( 1 — U n )) will reduce the variance of 
the estimate of Zs[g(Ai, ..., X„)]. That is, rather than generating k sets of n random 
numbers, we should generate k/2 sets and use each set twice. 

Example 11.14 (Simulating the Reliability Function) Consider a system of n com¬ 
ponents in which component i, independently of other components, works with prob¬ 
ability pi, i = l,... ,n. Letting 


Xt = 


1 , 

0, 


if component i works 
otherwise 


suppose there is a monotone structure function <p such that 


4>{X t,..., X„) 


1, if the system works under X \,..., X„ 
0, otherwise 


We are interested in using simulation to estimate 


r(p u ..., Pn ) = E[(p(X \,..., X,,)] = P{<p(X\,..., X n ) = 1} 


Now, we can simulate the A; by generating uniform random numbers U \,..., U n and 
then setting 


Xi = 


1 , 

0 , 


if Ui < pt 
otherwise 


Hence, we see that 


<t>(X \,..., X n ) = k(Ui,..., U n ) 
where k is a decreasing function of U \,..., U n . Hence, 
Cov(/t(U),£(l-U)) ^ 0 


and so the antithetic variable approach of using Ui,... ,U n to generate both k(U\,..., 
U n ) and k (1 — U \, ..., 1 — U n ) results in a smaller variance than if an independent set 
of random numbers was used to generate the second k. ■ 
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Example 11.15 (Simulating a Queueing System) Consider a given queueing system, 
let Dj denote the delay in queue of the ith arriving customer, and suppose we are 
interested in simulating the system so as to estimate 


6 = E[D\ + • • • + D n ] 


Let X\, ... ,X n denote the first n interarrival times and S \, S„ the first n service 
times of this system, and suppose these random variables are all independent. Now in 
most systems D\ + • • • + D n will be a function of Ai, ..., X n , Si, ..., S„ —say, 

D\ + • • • + D n = g(X i, ..., X n , S[,..., S n ) 

Also, g will usually be increasing in Si and decreasing in A;, i = 1, ..., n. If we use 
the inverse transform method to simulate Xj, Si, i = 1, ..., n —say, A,- = FF 1 (1 — 
Ui ). Si — GJ (Ui) where U\, ..., U n , U\, ..., U n are independent uniform random 
numbers—then we may write 

D l + --- + D n = k(U 1 ,...,U n ,U u ...,U n ) 

where k is increasing in its variates. Hence, the antithetic variable approach will reduce 
the variance of the estimator of 9. (Thus, we would generate 17,, U l , i = 1 ,... ,n and 
set Xi = FF 1 (1 — Ui) and F; = GF l (Ui) for the first run, and A/ = FF\Ui) and 
F; = G’~ 1 (1 — Ui) for the second.) As all the Uj and Uj are independent, however, this 
is equivalent to setting A, = FF ] ({/,-), F; = G~ l (Ui) in the first run and using 1 — Ui 
for Ui and 1 — Uj for U, in the second. ■ 

11.6.2 Variance Reduction by Conditioning 

Let us start by recalling (see Proposition 3.1) the conditional variance formula 

Var(F) = £[Var(F|Z)] + Var(£[F|Z]) (11.12) 

Now suppose we are interested in estimating F[g(X \,..., A,,)] by simulating X = 
(A i,..., A„) and then computing F = g (A i,..., A„). Now, if for some random vari¬ 
able Z we can compute E[F|Z] then, as Var(F|Z) ^ 0, it follows from the conditional 
variance formula that 

Var(E[F|Z]) < Var(F) 

implying, since ^[^[FIZ]] = £[F], that E[Y\Z] is a better estimator of E[Y ] than 
is F. 

In many situations, there are a variety of Z, that can be conditioned on to obtain 
an improved estimator. Each of these estimators E[Y\Zi) will have mean E[Y] and 
smaller variance than does the raw estimator F. We now show that for any choice of 
weights Xi , *i> o, E,-,£[F|Z,] is also an improvement over F. 

Proposition 11.8 For any Xj ^ 0, E/^t G = L 
(a) EX i X i E(Y\Z i ]\ = E(Y], 
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(b) Var(£, a,£[K|Z,-]) < Var(Y). 


Proof. The proof of (a) is immediate. To prove (b), let N denote an integer valued 
random variable independent of all the other random variables under consideration and 
such that 


P[N = i} = ki, iXz 1 


Applying the conditional variance formula twice yields 


Var(T) ^ Var(£[T|A, Z N ]) 

^Var(E[E[Y\N, Z N ]\Z U ...]) 

= Var^i£[F| Z,-] 


Example 11.16 Consider a queueing system having Poisson arrivals and suppose that 
any customer arriving when there are already N others in the system is lost. Suppose 
that we are interested in using simulation to estimate the expected number of lost 
customers by time t. The raw simulation approach would be to simulate the system up 
to time t and determine L, the number of lost customers for that run. A better estimate, 
however, can be obtained by conditioning on the total time in [0, t] that the system is 
at capacity. Indeed, if we let T denote the time in [0, f] that there are N in the system, 
then 

E[L\T] = XT 

where X is the Poisson arrival rate. Hence, a better estimate for E[L] than the average 
value of L over all simulation runs can be obtained by multiplying the average value 
of T per simulation run by X. If the arrival process were a nonhomogeneous Poisson 
process, then we could improve over the raw estimator L by keeping track of those 
time periods for which the system is at capacity. If we let I \...., l( denote the time 
intervals in [0, t ] in which there are N in the system, then 



where X(s ) is the intensity function of the nonhomogeneous Poisson arrival process. 
The use of the right side of the preceding would thus lead to a better estimate of E[L] 


than the raw estimator L. 


Example 11.17 Suppose that we wanted to estimate the expected sum of the times 
in the system of the first n customers in a queueing system. That is, if Wj is the time 
that the ith customer spends in the system, then we are interested in estimating 


n 


E w ' 


e = e 
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Let Yj denote the “state of the system” at the moment at which the / th customer arrives. 
It can be shown that for a wide class of models the estimator L[ W) | Y{\ has (the 
same mean and) a smaller variance than the estimator Yl'!= 1 • (It should be noted 

that whereas it is immediate that E[Wi\Y,] has smaller variance than Wj, because of 
the covariance terms involved it is not immediately apparent that ]L" =] E [ W, | K, ] has 
smaller variance than ]U" =1 W/ ■) I' 01 ' instance, in the model G /M /1 

E[Wi\Yi] = ( Ni + l)/n 

where N, is the number in the system encountered by the /th arrival and 1 /// is the 
mean service time; the result implies that W + l)//x is a better estimate of the 
expected total time in the system of the first n customers than is the raw estimator 

E?=t w i- ■ 

Example 11.18 (Estimating the Renewal Function by Simulation) Consider a 
queueing model in which customers arrive daily in accordance with a renewal process 
having interarrival distribution F. However, suppose that at some fixed time T, for 
instance 5 P.M., no additional arrivals are permitted and those customers that are still 
in the system are serviced. At the start of the next and each succeeding day customers 
again begin to arrive in accordance with the renewal process. Suppose we are interested 
in determining the average time that a customer spends in the system. Upon using the 
theory of renewal reward processes (with a cycle starting every T time units), it can be 
shown that 


average time that a customer spends in the system 

E [sum of the times in the system of arrivals in (0, 7’)] 

~~ m(T) 

where m(T) is the expected number of renewals in (0, T). 

If we were to use simulation to estimate the preceding quantity, a run would con¬ 
sist of simulating a single day, and as part of a simulation run, we would observe 
the quantity N(T), the number of arrivals by time T. Since E[N(T)] = m(T), the 
natural simulation estimator of m(T) would be the average (over all simulated days) 
value of N(T) obtained. However, Var(/V(L)) is, for large T, proportional to T (its 
asymptotic form being Ta 2 //x 3 , where a 2 is the variance and // the mean of the inter¬ 
arrival distribution F), and so, for large T, the variance of our estimator would be large. 
A considerable improvement can be obtained by using the analytic formula (see 
Section 7.3) 


T 

m(T) = -1 + 


E[Y(T)] 


(11.13) 


where Y ( T) denotes the time from T until the next renewal—that is, it is the excess 
life at T. Since the variance of Y(T) does not grow with T (indeed, it converges to a 
finite value provided the moments of F are finite), it follows that for T large, we would 
do much better by using the simulation to estimate E[Y(T)] and then using Equation 
(11.13) to estimate m(T). 


* S. M. Ross, “Simulating Average Delay—Variance Reduction by Conditioning,” Probability in the Engi¬ 
neering and Informational Sciences 2(3), (1988), pp. 309-312. 





Simulation 


687 


T-x 


T 


T+Y(T) 


Figure 11.5 A(T) = x. 

However, by employing conditioning, we can improve further on our estimate of 
m(T). To do so, let A(T) denote the age of the renewal process at time T —that is, it 
is the time at T since the last renewal. Then, rather than using the value of Y ( T ), we 
can reduce the variance by considering E[Y(T)\A(T)]. Now, knowing that the age at 
T is equal to x is equivalent to knowing that there was a renewal at time T — x and the 
next interarrival time X is greater than x. Since the excess at T will equal X — x (see 
Figure 1 1.5), it follows that 


E[Y(T)\A(T) =x] = E[X - x\X > x] 


f°° P{X -x > 
'o P{X > x 


P{X-x > t) 
- at 



which can be numerically evaluated if necessary. 

As an illustration of the preceding note that if the renewal process is a Poisson 
process with rate X, then the raw simulation estimator N(T) will have variance XT ; 
since Y(T) will be exponential with rate X, the estimator based on (11.13) will have 
variance X 2 Var { Y(T)\ = 1. On the other hand, since Y(T) will be independent 
of A(T ) (and E[Y(T)\A(T)] = 1/A.), it follows that the variance of the improved 
estimator E[Y(T)\A(T)] is 0. That is, conditioning on the age at time T yields, in this 
case, the exact answer. ■ 

Example 11.19 Consider the M/G/ 1 queueing system where customers arrive in 
accordance with a Poisson process with rate X to a single server having service distri¬ 
bution G with mean £[S], Suppose that, for a specified time to, the server will take a 
break at the first time t ^ to at which the system is empty. That is, if X (?) is the number 
of customers in the system at time t, then the server will take a break at time 

T = min {t > to: X(t) — 0} 

To efficiently use simulation to estimate E[T\, generate the system to time to', let R 
denote the remaining service time of the customer in service at time to, and let Xq 
equal the number of customers waiting in queue at time to- (Note that R is equal to 0 if 
X(to) = 0, and Xq = (X(to) — 1) + .) Now, with N equal to the number of customers 
that arrive in the remaining service time R, it follows that if N = n and Xq = n q, 
then the additional amount of time from to + R until the server can take a break is equal 
to the amount of time that it takes until the system, starting with n + uq customers, 
becomes empty. Because this is equal to the sum of n + iiq busy periods, it follows 
from Section 8.5.3 that 


E[S] 


E[T\R, N, X Q ] = t 0 + R + (N + Xq) 


1 - XE[S] 
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Consequently, 


E[T\R, X Q ] = E[E[T\R , N, X Q ]\R, X Q \ 

E[S] 

= t 0 + R + (E[N\R, X Q ] + *2)7—^ 

E[S] 

- t0 + R + (XR + x Q ) i-x E [S] 

Thus, rather than using the generated value of T as the estimator from a simulation run, it 
is better to stop the simulation at time to and use the estimator to + (XR + Xq) pr^f[y[ ■ 


11.6.3 Control Variates 

Again suppose we want to use simulation to estimate £[g(X)] where X = (X \. ..., X n ). 
But now suppose that for some function / the expected value of /(X) is known—say, 
E[ f(X)] = fi. Then for any constant a we can also use 

W=g(X) + a(f(X)-n) 

as an estimator of £[g(X)]. Now, 

Var(W) = Var(g(X)) + a 2 Var(/(X)) + 2 a Cov(g(X), /(X)) 


Simple calculus shows that the preceding is minimized when 

= -Co v(/(X),g(X)) 

Var(/(X)) 

and, for this value of a, 


Var(lT) = Var(g(X)) - 


[Cov(/(X), g(X))] 2 
Var(/(X)) 


Because Var(/(X)) and Cov(/(X), g(X)) are usually unknown, the simulated data 
should be used to estimate these quantities. 

Dividing the preceding equation by Var(g(X)) shows that 


Var(lT) 

Var(g(X)) 


1 — Corr 2 (/(X), g(X)) 


where Corr(X, Y) is the correlation between X and Y. Consequently, the use of a 
control variate will greatly reduce the variance of the simulation estimator whenever 
/(X) and g(X) are strongly correlated. 

Example 11.20 Consider a continuous-time Markov chain that, upon entering state i , 
spends an exponential time with rate v/ in that state before making a transition into some 
other state, with the transition being into state j with probability P, i > 0J # i- 
Suppose that costs are incurred at rate C(i ) ^ 0 per unit time whenever the chain is in 
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state i, i ^ 0. With X(t) equal to the state at time t, and a being a constant such that 
0 < a < 1, the quantity 


W = 


-f 


e~ a, C{X(t))dt 


represents the total discounted cost. For a given initial state, suppose we want to use 
simulation to estimate E[W]. Whereas at first it might seem that we cannot obtain 
an unbiased estimator without simulating the continuous-time Markov chain for an 
infinite amount of time (which is clearly impossible), we can make use of the results 
of Example 5.1, which gives the equivalent expression for E[W ]: 


E[W] = E 



C(X(t))dt 


where T is an exponential random variable with rate a that is independent of the 
continuous-time Markov chain. Therefore, we can first generate the value of T , then 
generate the states of the continuous-time Markov chain up to time T , to obtain the 
unbiased estimator jJ C(X(t)) dt. Because all the cost rates are nonnegative this esti¬ 
mator is strongly positively correlated with T, which will thus make an effective control 
variate. ■ 


Example 11.21 (A Queueing System) Let D n+ \ denote the delay in queue of the 
n +1 customer in a queueing system in which the interarrival times are independent and 
identically distributed (i.i.d.) with distribution F having mean fip and are independent 
of the service times, which are i.i.d. with distribution G having mean // q. If A,- is the 
interarrival time between arrival i and i + 1, and if Si is the service time of customer 
i, i X 1, we may write 


D, i+l = g(Xi, ..., X„, Si, ..., S n ) 


To take into account the possibility that the simulated variables A,, .S', may by chance 
be quite different from what might be expected we can let 

n 

/(A 1 ,...,A„,Si,...,S, i ) = ^(#-A i ) 

i=i 

As £[/(X, S)] = n(nG ~ Ff) we could use 
g(X, S) + a[f (X, S )-n(n G -n F )i 


as an estimator of E[D n + 1], Since Z)„+i and / are both increasing functions of 
Sj, —Xi, i = 1,..., n it follows from Theorem 11.1 that /(X, S) and D n+ \ are posi¬ 
tively correlated, and so the simulated estimate of a should turn out to be negative. 

If we wanted to estimate the expected sum of the delays in queue of the first N(T) 
arrivals, then we could use 1 S, as our control variable. Indeed as the arrival 

process is usually assumed independent of the service times, it follows that 

~N(T) 

£* 

! = 1 


E 


= E[S]E[N{T)] 
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where E[N(T )] can either be computed by the method suggested in Section 7.8 or 
estimated from the simulation as in Example 11.18. This control variable could also be 
used if the arrival process were a nonhomogeneous Poisson with rate 7(f); in this case. 



11.6.4 Importance Sampling 

Let X = (X\,.... X n ) denote a vector of random variables having a joint density func¬ 
tion (or joint mass function in the discrete case) /(x) = f(x i, ..., x„), and suppose 
that we are interested in estimating 



where the preceding is an n -dimensional integral. (If the X, are discrete, then interpret 
the integral as an /7-fold summation.) 

Suppose that a direct simulation of the random vector X, so as to compute values 
of /i(X), is inefficient, possibly because (a) it is difficult to simulate a random vector 
having density function /(x), or (b) the variance of h(X) is large, or (c) a combination 
of (a) and (b). 

Another way in which we can use simulation to estimate 9 is to note that if g(x) 
is another probability density such that /(x) = 0 whenever g(x) = 0, then we can 
express 9 as 



/7(X)/(X) 


(11.14) 


where we have written E g to emphasize that the random vector X has joint density 

g(x). 

It follows from Equation (11. 14) that 0 can be estimated by successively generating 
values of a random vector X having density function g (x) and then using as the estimator 
the average of the values of /z(X)/(X)/g(X). If a density function g(x) can be chosen 
so that the random variable h(X)f(X)/g(X) has a small variance then this approach— 
referred to as importance sampling —can result in an efficient estimator of 9. 

Let us now try to obtain a feel for why importance sampling can be useful. To begin, 
note that /(X) and g(X) represent the respective likelihoods of obtaining the vector X 
when X is a random vector with respective densities / and g. Hence, if X is distributed 
according to g, then it will usually be the case that /(X) will be small in relation to 
g(X) and thus when X is simulated according to g the likelihood ratio /(X)/g(X) will 
usually be small in comparison to 1. However, it is easy to check that its mean is 1: 
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Thus we see that even though /(X)/g(X) is usually smaller than 1. its mean is equal to 
1; thus implying that it is occasionally large and so will tend to have a large variance. So 
how can h(K) f (X)/g(X) have a small variance? The answer is that we can sometimes 
arrange to choose a density g such that those values of x for which f(x)/g(x) is 
large are precisely the values for which h (x) is exceedingly small, and thus the ratio 
/t(X)/(X)/g(X) is always small. Since this will require that h(x) sometimes be small, 
importance sampling seems to work best when estimating a small probability; for in 
this case the function h(x) is equal to 1 when x lies in some set and is equal to 0 
otherwise. 

We will now consider how to select an appropriate density g. We will find that 
the so-called tilted densities are useful. Let M(t ) = Ef[e tX ] = f e tx f(x)dx be the 
moment generating function corresponding to a one-dimensional density /. 

Definition 11.2 A density function 


ft(x) = 


e ,x f(x) 

M(t) 


is called a tilted density of /, —oo < t < oo. 

A random variable with density f t tends to be larger than one with density / when 
t > 0 and tends to be smaller when t < 0. 

In certain cases the tilted distributions f t have the same parametric form as does /. 

Example 11.22 If / is the exponential density with rate X then 


f,(x) = Ce tx Xe~ Xx = XCe~ (x - ,)x 


where C = 1 /M(t) does not depend on x. Therefore, for t f X, f, is an exponential 
density with rate X — t. 

If / is a Bernoulli probability mass function with parameter p , then 
fix) = p x {\ - p) l ~ x , x = 0, 1 


Hence, M(t) = E /[e ] = pe’ + 1 — p and so 


ft (x ) = 


M{t) 


ipe’YH - p) 


pe 


l-x 


pe‘ + 1 — p ) \ pe’ + 1 — p 


1 ~P 


l-x 


(11.15) 


That is, f, is the probability mass function of a Bernoulli random variable with 
parameter 


Pt 


pe’ 

pe’ + 1 — p 


We leave it as an exercise to show that if / is a normal density with parameters /i and 
cr 2 then f t is a normal density with mean /x + a 2 t and variance o~. ■ 
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In certain situations the quantity of interest is the sum of the independent random 
variables Xi, ..., X n . In this case the joint density / is the product of one-dimensional 
densities. That is, 

fix t ,...,X„) = fl (X\) ■■■ f n (X n ) 

where f is the density function of X ,•. In this situation it is often useful to generate the 
Xj according to their tilted densities, with a common choice of t employed. 

Example 11.23 Let Xi ,..., X n be independent random variables having respective 
probability density (or mass) functions /;, for 1=1,..., n. Suppose we are interested 
in approximating the probability that their sum is at least as large as a , where a is much 
larger than the mean of the sum. That is, we are interested in 

9 = P{S ^ a] 

where S = f2'i =l and where a > YL'‘= \ L[A',]. Letting I{S Js a] equal 1 if S ^ a 
and letting it be 0 otherwise, we have that 

9 = £f[/{S ^ a}] 

where f = (/), ..., /„). Suppose now that we simulate X, according to the tilted 
mass function f t , i = 1with the value of t, t > 0 left to be determined. The 
importance sampling estimator of 9 would then be 



Now, 



and so 


9 = I{S ^ a}M(t)e~' s 

where M(t) = |~[ M, (t) is the moment generating function of S. Since t > 0 and 
I{S ^ a} is equal to 0 when S < a, it follows that 


I{S ^ a}e~ ,s < e~' a 


and so 


9 < M(t)e~ ,a 


To make the bound on the estimator as small as possible we thus choose t,t > 0, 
to minimize M(t)e~ ,a . In doing so, we will obtain an estimator whose value on each 
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iteration is between 0 and min ,M(t)e ,a . It can be shown that the minimizing t, call it 
?*, is such that 


£,*[£] = 


E* 


where, in the preceding, we mean that the expected value is to be taken under the 
assumption that the distribution of Xj is f,j * for; = 1..... n. 

For instance, suppose that X \,..., X n are independent Bernoulli random variables 
having respective parameters p ,, for i — 1, ..., n. Then, if we generate the X, according 
to their tilted mass functions ptj, i = 1 ,,n, the importance sampling estimator of 
e = P{S > a} is 


n 

6 = I{S > a)e- ,s Y\(p i e t + \ - pi) 

i=t 


Since pp t is the mass function of a Bernoulli random variable with parameter /?,■ e ! / 
(p,e' + I — pi) it follows that 


E, 


E^ 

.i=i 


= E 


Pie 


^ pie' + 1 - pi 


The value of t that makes the preceding equal to a can be numerically approximated 
and then utilized in the simulation. 

As an illustration, suppose that n — 20, p, — 0.4, and a — 16. Then 


E,[S] = 20 


OAe' 

0.4e f + 0.6 


Setting this equal to 16 yields, after a little algebra, 


= 6 


Thus, if we generate the Bernoullis using the parameter 
0.4e r * 

-;-= 0.8 

0.4e' + 0.6 

then because 

M(t*) = (OAe 1 * + 0.6) 20 and e~'* s = (1/6) 5 
we see that the importance sampling estimator is 
§ = I{S ^ 16}(l/6) 5 3 20 
It follows from the preceding that 

§ ^ (1/6) 16 3 20 = 81/2 16 = 0.001236 
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That is, on each iteration the value of the estimator is between 0 and 0.001236. Since, 
in this case, 6 is the probability that a binomial random variable with parameters 20, 
0.4 is at least 16, it can be explicitly computed with the result 6 = 0.000317. Hence, 
the raw simulation estimator /, which on each iteration takes the value 0 if the sum of 
the Bernoullis with parameter 0.4 is less than 16 and takes the value 1 otherwise, will 
have variance 

Var(7) = 0(1 - 0) = 3.169 x 10“ 4 

On the other hand, it follows from the fact that 0 ^ 6 ^ 0.001236 that (see Exercise 33) 

Var(0) sC 2.9131 x 10“ 7 ■ 

Example 11.24 Consider a single-server queue in which the times between successive 
customer arrivals have density function / and the service times have density g. Let 
D n denote the amount of time that the nth arrival spends waiting in queue and suppose 
we are interested in estimating a = P{D n ^ a } when a is much larger than E [ D„]. 
Rather than generating the successive interarrival and service times according to / and 
g, respectively, they should be generated according to the densities /_, and g t , where 
t is a positive number to be determined. Note that using these distributions as opposed 
to / and g will result in smaller interarrival times (since —t < 0) and larger service 
times. Hence, there will be a greater chance that D n > a than if we had simulated 
using the densities / and g. The importance sampling estimator of a would then be 

a = I[D„ > a}e ,{Sn ~ Y " ) [M f (-t)M g (t)] n 

where S n is the sum of the first n interarrival times, Y n is the sum of the first n service 
times, and Mf and M g are the moment generating functions of the densities / and g, 
respectively. The value of t used should be determined by experimenting with a variety 
of different choices. ■ 


11.7 Determining the Number of Runs 

Suppose that we are going to use simulation to generate r independent and identically 
distributed random variables T (1 \ ..., Y lr) having mean // and variance er 2 . We are 
then going to use 

yd) + ... + yd) 

Y r = - 

r 

as an estimate of fi. The precision of this estimate can be measured by its variance 

Var(y r ) = E[(Y r - /x) 2 ] 

= a 2 /r 
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Hence, we would want to choose r, the number of necessary runs, large enough so that 
a 2 /r is acceptably small. However, the difficulty is that a 2 is not known in advance. 
To get around this, you should initially simulate k runs (where k ^ 30) and then use 
the simulated values F^,..., Y <k> to estimate a 2 by the sample variance 

k 

J2 (Y (,) - Y k ) 2 /(k - 1) 

i=i 

Based on this estimate of a 2 the value of r that attains the desired level of precision 
can now be determined and an additional r — k runs can be generated. 


11.8 Generating from the Stationary Distribution 
of a Markov Chain 

11.8.1 Coupling from the Past 

Consider an irreducible Markov chain with states 1 , ... ,m and transition probabilities 
Pi j and suppose we want to generate the value of a random variable whose distri¬ 
bution is that of the stationary distribution of this Markov chain. Whereas we could 
approximately generate such a random variable by arbitrarily choosing an initial state, 
simulating the resulting Markov chain for a large fixed number of time periods, and 
then choosing the final state as the value of the random variable, we will now present 
a procedure that generates a random variable whose distribution is exactly that of the 
stationary distribution. 

If, in theory, we generated the Markov chain starting at time — oo in any arbitrary 
state, then the state at time 0 would have the stationary distribution. So imagine that 
we do this, and suppose that a different person is to generate the next state at each of 
these times. Thus, if X(—n), the state at time —n, is i, then person —n would generate 
a random variable that is equal to j with probability Pj ,■, j = 1,..., m, and the value 
generated would be the state at time — (n — 1). Now suppose that person — 1 wants to do 
his random variable generation early. Because he does not know what the state at time 

— 1 will be, he generates a sequence of random variables N-\ (i), i = \..... m. where 
N-\{i), the next state if X{— 1) = i, is equal to j with probability I) j , 7 = 1,..., m. 
If it results that X(— 1) = i, then person — 1 would report that the state at time 0 is 

S-i(i ) = N-i(i), i = 1,..., m 

(That is, S-i(i) is the simulated state at time 0 when the simulated state at time 

— 1 is i .) 

Now suppose that person —2, hearing that person —1 is doing his simulation 
early, decides to do the same thing. She generates a sequence of random variables 
N- 2 (i),i = 1,...,/n, where N_ 2 (i) is equal to j with probability Pj j, j = \..... m. 
Consequently, if it is reported to her that X(—2) = i, then she will report that 
X (— 1) = /V- 2 O'). Combining this with the early generation of person —1 shows 



696 


Introduction to Probability Models 


that if X(—2) — i, then the simulated state at time 0 is 
S- 2 (i) = S-i(N- 2 (i)), i = 1- ,m 

Continuing in the preceding manner, suppose that person —3 generates a sequence 
of random variables i = I, m , where N-j(i) is to be the generated value 

of the next state when X(—3) = i. Consequently, if X(—3) = i then the simulated 
state at time 0 would be 


5-3(0 = S- 2 (JV_ 3 (i)), i = 

Now suppose we continue the preceding, and so obtain the simulated functions 

5_i (0, S- 2 (i), S- 3 (i ),..., i = 1,..., m 


Going backward in time in this manner, we will at some time, say —r, have a simulated 
function S- r (i) that is a constant function. That is, for some state j, S- r O') will equal j 
for all states i = 1 ,,m. But this means that no matter what the simulated values from 
time —oo to —r, we can be certain that the simulated value at time 0 is j. Consequently, 
j can be taken as the value of a generated random variable whose distribution is exactly 
that of the stationary distribution of the Markov chain. 

Example 11.25 Consider a Markov chain with states 1, 2, 3 and suppose that simu¬ 
lation yielded the values 


N-i(i) = 


and 


iV_ 2 (i) = 


Then 


If 


S-2(i) = 


N- 3 (i) = 


then 


S- 3 0) = 


3, 

2 , 

2 , 


3, 

1 , 

1 , 


if i = 1 
if i = 2 
if i = 3 


if i — 1 
if i = 2 
if i = 3 


if i — 1 
if i = 2 
if i = 3 


if i = 1 
if i = 2 
if ; = 3 


if i = 1 
if i = 2 
if i = 3 


Therefore, no matter what the state is at time —3, the state at time 0 will be 3. 
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Remark The procedure developed in this section for generating a random variable 
whose distribution is the stationary distribution of the Markov chain is called coupling 
from the past. 

11.8.2 Another Approach 

Consider a Markov chain whose state space is the nonnegative integers. Suppose the 
chain has stationary probabilities, and denote them by jr,-, i f 0. We now present 
another way of simulating a random variable whose distribution is given by the 7T;, i f 0, 
which can be utilized if the chain satisfies the following property. Namely, that for some 
state, which we will call state 0, and some positive number a 

Pi .o ^ a > 0 

for all states i. That is, whatever the current state, the probability that the next state will 
be 0 is at least some positive value a. 

To simulate a random variable distributed according to the stationary probabili¬ 
ties, start by simulating the Markov chain in the obvious manner. Namely, whenever 
the chain is in state /, generate a random variable that is equal to j with probability 
Pi j , j f 0, and then set the next state equal to the generated value of this random 
variable. In addition, however, whenever a transition into state 0 occurs a coin, whose 
probability of coming up heads depends on the state from which the transition occurred, 
is flipped. Specifically, if the transition into state 0 was from state /, then the coin flipped 
has probability a/Pip of coming up heads. Call such a coin an /-coin,/ ^ 0. If the coin 
comes up heads then we say that an event has occurred. Consequently, each transition 
of the Markov chain results in an event with probability a , implying that events occur 
at rate a. Now say that an event is an / -event if it resulted from a transition out of state 
/; that is, an event is an /-event if it resulted from the flip of an /-coin. Because tt, is 
the proportion of transitions that are out of state i, and each such transition will result 
in an /-event with probability a, it follows that the rate at which / -events occur is ait,. 
Therefore, the proportion of all events that are / -events is cntila — tt; , i f 0. 

Now, suppose that Xo = 0. Fix i, and let 7 ; equal 1 if the j th event that occurs is 
an /-event, and let I j equal 0 otherwise. Because an event always leaves the chain in 
state 0 it follows that Ij,j f 1, are independent and identically distributed random 
variables. Because the proportion of the Ij that are equal to 1 is jr,, we see that 

I] +... + /„ 

7tj = lim - 

n-s-oo n 

= E[h] 

= P(h = 1) 

where the second equality follows from the strong law of large numbers. Hence, 
if we let 

T — min{n > 0 : an event occurs at time n ) 
denote the time of the first event, then it follows from the preceding that 
TXi = P(I 1 = 1) = P(X T -1 = /) 
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As the preceding is true for all states ;, it follows that Xj _\, the state of the Markov 
chain at time T — 1, has the stationary distribution. 


Exercises 


* 1. Suppose it is relatively easy to simulate from the distributions Fj,i = 1,2,.. 
If n is small, how can we simulate from 


., n. 


F(x) = J2 PiFi(x), Pi > 0, J2 P ' = 17 


i=l 


Give a method for simulating from 
1 - e“ 2 * + 2x 


F(x) = 


3 — e 


—2x 


, 0 < x < 1 

1 < X < oo 


2. Give a method for simulating a negative binomial random variable. 

*3. Give a method for simulating a hypergeometric random variable. 

4. Suppose we want to simulate a point located at random in a circle of radius r 
centered at the origin. That is, we want to simulate X, Y having joint density 

f(x,y)=-^- 1 , x 2 + y 2 ^r 2 
7 rr z 

(a) Let R — Vx 2 + Y 2 , 0 — tan -1 Y/X denote the polar coordinates. Compute 
the joint density of R, 0 and use this to give a simulation method. Another 
method for simulating X, Y is as follows: 

Step 1: Generate independent random numbers U\,U 2 and set Z j = 2rU\ — 
r, Z 2 = 2rU2 — r. Then Z\, Z 2 is uniform in the square whose 
sides are of length 2r and which encloses, the circle of radius r (see 
Figure 11.6). 

Step 2: If (Zj, Zt) lies in the circle of radius r —that is, if Z[ + Zj ^ r 2 —set 
(X, Y) = (Z 1 , Z 2 ). Otherwise return to step 1. 

(b) Prove that this method works, and compute the distribution of the number of 
random numbers it requires. 

5. Suppose it is relatively easy to simulate from F, for each i = 1,...,«. How can 
we simulate from 

(a) F(x) = U'UFi(x)l 

(b) Fix) = 1 - n?=td - Fi(x))l 

(c) Give two methods for simulating from the distribution F(x) = x n , 0 < x < 1. 
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Figure 11.6 


*6. In Example 11.4 we simulated the absolute value of a standard normal by using the 
Von Neumann rejection procedure on exponential random variables with rate 1. 
This raises the question of whether we could obtain a more efficient algorithm by 
using a different exponential density—that is, we could use the density g(x ) = 
Xe~ kx . Show that the mean number of iterations needed in the rejection scheme 
is minimized when X = 1. 

7. Give an algorithm for simulating a random variable having density function 

f{x) = 30(x 2 — 2x 3 + x 4 ), 0 < x < 1 

8. Consider the technique of simulating a gamma (n, X) random variable by using 
the rejection method with g being an exponential density with rate X/n. 

(a) Show that the average number of iterations of the algorithm needed to generate 
a gamma is n n e 1 ~ n /(n — 1)!. 

(b) Use Stirling’s approximation to show that for large n the answer to part (a) is 
approximately equal to e[(n — 1)/(2jt)] 1//2 . 

(c) Show that the procedure is equivalent to the following: 

Step 1: Generate Y\ and Y 2 , independent exponentials with rate 1. 

Step 2: If Y\ < (n — 1)[T2 — log( Y 2 ) — 1], return to step 1. 

Step 3: Set X = nYi/X. 

(d) Explain how to obtain an independent exponential along with a gamma from 
the preceding algorithm. 

9. Set up the alias method for simulating from a binomial random variable with 
parameters n = 6, p = 0.4. 

10. Explain how we can number the Q ,lc> in the alias method so that k is one of the 
two points that Q ik) gives weight. 

Hint: Rather than giving the initial Q the name Q (1 \ what else could we call 
it? 

11. Complete the details of Example 11.10. 
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12. Let Xi.X* be independent with 

P{Xi = j] = 7 = 1 ,. .., n, i = 1 ,..., k 

n 

If D is the number of distinct values among X i,..., X* show that 


E[D] = n 


/ n — 1 ^ k 

v n ) 

ss k 

k 2 

2 n 

k 2 

when — 
n 


13. The Discrete Rejection Method: Suppose we want to simulate X having proba¬ 
bility mass function P{X = i] = P ,, i = \..... n and suppose we can easily 
simulate from the probability mass function Q,. Q, = 1, Q, f 0. Let C be 
such that Pi C Qi. i — I..... n. Show that the following algorithm generates 
the desired random variable: 

Step 1: Generate Y having mass function Q and U an independent random 
number. 

Step 2: If U f Py/CQy, set X — Y. Otherwise return to step 1. 

14. The Discrete Hazard Rate Method: Let X denote a nonnegative integer valued 
random variable. The function k(n) — P\X = n \ X f «}, n f 0, is called the 
discrete hazard rate function. 

(a) Show that P{X = n} = X(n) n'=o (1 - Hi)). 

(b) Show that we can simulate X by generating random numbers U\, U 2 , ■ ■ ■ 
stopping at 

X — min{«: U n ^ k(n)} 

(c) Apply this method to simulating a geometric random variable. Explain, intu¬ 
itively, why it works. 

(d) Suppose that /,(/;) f p < \ for all n. Consider the following algorithm for 
simulating X and explain why it works: Simulate X, , £/;, i f 1 where X, is 
geometric with mean 1/p and Uj is a random number. Set Sk = X \ + ■ ■ ■ + X^ 
and let 


X = minf^: Uk ^ X(Sk)/p] 

15. Suppose you have just simulated a normal random variable X with mean pt and 
variance o 2 . Give an easy way to generate a second normal variable with the same 
mean and variance that is negatively correlated with X. 

*16. Suppose n balls having weights w\, w 2 ,..., w n are in an urn. These balls are 
sequentially removed in the following manner: At each selection, a given ball in 
the urn is chosen with a probability equal to its weight divided by the sum of the 
weights of the other balls that are still in the urn. Let I \, G,..., /„ denote the 
order in which the balls are removed—thus is a random permutation 

with weights. 
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(a) Give a method for simulating I\ . 

(b) Let X\ be independent exponentials with rates ut;, i = 1. n. Explain how 

Xi can be utilized to simulate I\ . 

17. Order Statistics: Let Xi ,..., X n be i.i.d. from a continuous distribution F, and 
let X(i) denote the i th smallest of X \,..., X n , i = 1Suppose we want to 
simulate X(\ } < Xa) < ■ ■ ■ < X ln) . One approach is to simulate n values from 
F, and then order these values. However, this ordering, or sorting , can be time 
consuming when n is large. 

(a) Suppose that /At), the hazard rate function of F, is bounded. Show how the 
hazard rate method can be applied to generate the n variables in such a manner 
that no sorting is necessary. 

Suppose now that F~ l is easily computed. 

(b) Argue that X(i ), ..., X( n ) can be generated by simulating U( i) < U( 2 ) < 
■ ■ ■ < U ( n )—the ordered values of n independent random numbers—and then 
setting X(t } = F~ l (U(i)). Explain why this means that X can be generated 
from F~ l (Pi) where /3,- is beta with parameters i, n + i + 1. 

(c) Argue that f/(i), ..., U(„, can be generated, without any need for sorting, by 
simulating i.i.d. exponentials Y\,, Y n+ \ and then setting 


U(i) 


Yi + ■ ■ ■ + Yj = 
Y\ + • • • + L/i+i 


Hint: Given the time of the (n + l)st event of a Poisson process, what can be 
said about the set of times of the first n events? 

(d) Show that if U( n ) = y then U( i), ..., has the same joint distribution 

as the order statistics of a set of n — 1 uniform (0, y) random variables. 

(e) Use part (d) to show that U( i), ..., U( n ) can be generated as follows: 

Step 1: Generate random numbers U\,, U„. 

Step 2: Set 


U (n) = u\ /n , £/{„-!) = U w (U 2 ) l/(n ~'\ 

U (j - 1) = t/ 0) (t/„_ /+ 2) I/0 ' _I) , ./'= 2, — 1 


18. Let X i,..., X n be independent exponential random variables each having rate 
1. Set 


Wj = Xi/n, 

Wi = Wi- 1 + — -i—-, i = 2,... ,n 

n — i + 1 

Explain why W \, ..., W n has the same joint distribution as the order statistics of 
a sample of n exponentials each having rate 1. 
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19. Suppose we want to simulate a large number n of independent exponentials with 
rate 1—call them X\, X 2 ,..., X n . If we were to employ the inverse transform 
technique we would require one logarithmic computation for each exponential 
generated. One way to avoid this is to first simulate S „, a gamma random variable 
with parameters (n, 1) (say, by the method of Section 11.3.3). Now interpret S„ 
as the time of the nth event of a Poisson process with rate 1 and use the result 
that given S n the set of the first n — 1 event times is distributed as the set of n — 1 
independent uniform (0, S n ) random variables. Based on this, explain why the 
following algorithm simulates n independent exponentials: 

Step 1: Generate S n , a gamma random variable with parameters (n, 1). 

Step 2: Generate n — 1 random numbers U\, U 2 , ■ ■ ■, U n -\. 

Step 3: Order the C/,-, i = 1,..., n — 1 to obtain Up } < Up) < ■ ■ ■ < [/(„_j). 
Step 4: Let U^ o> = 0, U( n ) = 1, and set Xj = S„(U(i) — f/(,-_i)), i = 1,...,«. 

When the ordering (step 3) is performed according to the algorithm described in 
Section 11.5, the preceding is an efficient method for simulating n exponentials 
when all n are simultaneously required. If memory space is limited, however, and 
the exponentials can be employed sequentially, discarding each exponential from 
memory once it has been used, then the preceding may not be appropriate. 

20. Consider the following procedure for randomly choosing a subset of size k from 
the numbers 1,2Fix p and generate the first n time units of a renewal 
process whose interarrival distribution is geometric with mean l/p —that is, 
P {interarrival time = k] — p( I — p) k ~ l ,k= 1,2,.... Suppose events occur 
at times z'i < h < ■ ■ ■ < i m ^ n. If m = k, stop; / 1 ,..., i m is the desired 
set. If m > k, then randomly choose (by some method) a subset of size k from 

z'i. i m and then stop. If m < k, take * 1 , ..., i m as part of the subset of size 

k and then select (by some method) a random subset of size k — m from the set 
{1, 2, — {z'i, ..., Explain why this algorithm works. As £[A(zz)] = np 

a reasonable choice of p is to take p ~ k/n. (This approach is due to Dieter.) 

21. Consider the following algorithm for generating a random permutation of the 
elements 1,2, ...,«. In this algorithm, P(i) can be interpreted as the element in 
position i. 

Step 1: Set k = 1. 

Step 2: Set P(l) = 1. 

Step 3: If k = n, stop. Otherwise, let k = k + 1. 

Step 4: Generate a random number U, and let 

P(k) = P([kU] + 1), 

P([kU] + 1) = k. 

Go to step 3. 

(a) Explain in words what the algorithm is doing. 

(b) Show that at iteration k —that is, when the value of P(k) is initially set—that 

P(l), P(2), ..., P(k) is a random permutation of 1, 2. k. 
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Hint: Use induction and argue that 

Pk[h,h, 4-2, i] 

1 

= Pk-iih, h, • • •, ij- 1, L ij, • ••, 4-2}, 

1 

= - - by the induction hypothesis 

The preceding algorithm can be used even if n is not initially known. 

22. Verify that if we use the hazard rate approach to simulate the event times of 
a nonhomogeneous Poisson process whose intensity function /At ) is such that 
/.( t ) 4 4 then we end up with the approach given in method 1 of Section 11.5. 
*23. For a nonhomogeneous Poisson process with intensity function X(t), f ^ 0, where 
/ 0 °° A(t) dt = oo. let X i, X 2 , ■ ■ . denote the sequence of times at which events 
occur. 

(a) Show that fff 1 X(t) dt is exponential with rate 1. 

(b) Show that jff' | ),(t) dt, i 4 1, are independent exponentials with rate 1, 
where Xq = 0. 

In words, independent of the past, the additional amount of hazard that must be 
experienced until an event occurs is exponential with rate 1. 

24. Give an efficient method for simulating a nonhomogeneous Poisson process with 
intensity function 

1 

X(t) — b H-, t ^ 0 

t ci 

25. Let (X, Y) be uniformly distributed in a circle of radius r about the origin. That 
is, their joint density is given by 

f(x,y )=— T , 0^x 2 + y 2 <r 2 
nr A 

Let R = ~JX 2 + Y 2 and 0 — arc tan Y/X denote their polar coordinates. Show 
that R and 0 are independent with 0 being uniform on (0, 2 tt) and P{R < a} = 
a 2 /r 2 , 0 < a < r. 

26. Let R denote a region in the two-dimensional plane. Show that for a two- 
dimensional Poisson process, given that there are n points located in R, the 
points are independently and uniformly distributed in R —that is, their density 
is f{x,y) = c, ( x , y) e R where c is the inverse of the area of R. 

27. Let X \,..., X n be independent random variables with X[Xj] = 9, Var(2f,) = 
of i = 1,...,«, and consider estimates of 9 of the form Yfli=\ L V/ where 

k i = 1- Show that Var (X7=i 4 X, \ is minimized when 

4 = (1/a 2 ) j ^ 1/a 2 j 


i = 
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1, b 

g(x) 


1,0 


Figure 11.7 


Possible Hint: If you cannot do this for general n, try it first when n = 2. 

rl 

The following two problems are concerned with the estimation of J Q g(x) dx = 
T[g(lJ)] where U is uniform (0, 1). 

28. The Hit-Miss Method: Suppose g is bounded in [0, 1]—for instance, suppose 
0 ^ g(x) ^ b for x e [0, 1], Let U\, Uj be independent random numbers and 
set X — U \, Y = hlJi —so the point ( X , V) is uniformly distributed in a rectangle 
of length 1 and height b. Now set 

/= jl, if Y < g(X) 

0, otherwise 


That is, accept ( X , Y ) if it falls in the shaded area of Figure 11.7. 

(a) Show that E[bl] = f Q l g(x)dx. 

(b) Show that Var(W) ^ Var(g(t/)), and so hit-miss has larger variance than 
simply computing g of a random number. 

29. Stratified Sampling: Let Ui,... ,U n be independent random numbers and set 
Uj = (U, + i — l)/n,i = Hence, U,,i ^ 1, is uniform on 

((i — 1 )/n,i/n). ^” =] g(Ui)/n is called the stratified sampling estimator of 

fo S( x ) dx. 

(a) Show that £[£"=1 g(Ui)/n] = g(x)dx. 

(b) Show that Var[£" =1 g(Ui)/n] < Var[£” =| g(Ui)/n]. 

Hint: Let U be uniform (0, 1) and define N by N = i if (i — l)/n < U < 
i/n,i = 1,...,«. Now use the conditional variance formula to obtain 


Var (g(U)) = £[Var(g(t/)|A0] + Var(£[g(t/)|1V]) 
> £[Var(g(t/)|A0] 

Var(g(t/)|N = i) 


= £ 

i =1 


= E 


Var [*(£/,■)] 


i =1 


n 


n 
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30. If / is the density function of a normal random variable with mean /x and variance 
a 2 , show that the tilted density f, is the density of a normal random variable with 
mean /x + o 2 t and variance a 2 . 

31. Consider a queueing system in which each service time, independent of the past, 
has mean /x. Let W n and l)„ denote, respectively, the amounts of time customer n 
spends in the system and in queue. Hence, D n = W„ — S„ where S n is the service 
time of customer n. Therefore, 

E[D n ] = E[W n ] - /x 

If we use simulation to estimate E[D n ], should we 

(a) use the simulated data to determine D n , which is then used as an estimate of 
E[D n ]; or 

(b) use the simulated data to determine W n and then use this quantity minus /x as 
an estimate of E[D n ]l 

Repeat for when we want to estimate E[W n ], 

*32. Show that if X and Y have the same distribution then 

Var((X + T)/2) sC Var(X) 

Hence, conclude that the use of antithetic variables can never increase variance 
(though it need not be as efficient as generating an independent set of random 
numbers). 

33. If 0 ^ X ^ a, show that 

(a) E[X 2 ~\ s; aE[X], 

(b) Var(X) < E[X]{a - E[X]), 

(c) Var(X) < a 2 /4. 

34. Suppose in Example 11.19 that no new customers are allowed in the system after 
time to- Give an efficient simulation estimator of the expected additional time 
after to until the system becomes empty. 

35. Suppose we are able to simulate independent random variables X and Y. If we 
simulate 2k independent random variables X i, .... AT and Y\,..., Yk, where the 
Xj have the same distribution as does X, and the Yj have the same distribution 
as does Y, how would you use them to estimate P(X < Y) ? 

36. If U i, I/ 2 , t /3 are independent uniform (0,1) random variables, find 

p(uhUi > 0 . 1 ). 

Hint: Relate the desired probability to one about a Poisson process. 
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Solutions to Starred 
Exercises 



Chapter 1 


2. S = {(r, g), ( r , b ), ( g , r), (g, b ), (b , /-), (b , g)} where, for instance, (r, g) means 
that the first marble drawn was red and the second one green. The probability of 
each one of these outcomes is g. 

5. If he wins, he only wins $1; if he loses, he loses $3. 

9. F = E U FE C , implying since E and FE C are disjoint that P(F) = P(E) + 
P(FE C ). 

17. /’{end} = 1 — P{continue} 


= 1 - [Prob(ff, H, H) + Prob(7{ 7} T)] 


Fair coin: Pjend} = 1 — 


1 

2 


1 

2 


1 

2 


1 i r 

2 ' 2 ' 2 


3 

4 


Biased coin: Pjend} = 1 — 


"111 
4 ’ 4'4 


3 3 3" 

4 ’ 4 ’ 4 


9 

16 


19. 


£ = event at least 1 six 

number of ways to get E 11 

P(E) = -----= — 

number of sample points 36 

D = event two faces are different 


P(D) = 1 — P(two faces the same) = 1- 

36 


P(E\D) = 


P{ED) 

P(D) 


10/36 _ 1 
5/6 _ 3 


5 

6 


Introduction to Probability Models, Eleventh Edition. http://dx.doi.org/10.1016/B978-0-12-407948-9.00017-7 
© 2014 Elsevier Inc. All rights reserved. 
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25. (a) 

(b) 


/’{pair) = Pfsecond card is same denomination as first} 


3 

~~ 5l 

/’{pair | different suits} 


Pfpair, different suits} 
P{different suits} 
Pfpair} 

P{different suits} 

3/51 _ 1 
39/51 ~ 13 


27. P (E[) = 1 

39 

P(E 2 \E l )= — 

since 12 cards are in the ace of spades pile and 39 are not. 

26 

P(E 3 \E l E 2 ) = — 

since 24 cards are in the piles of the two aces and 26 are in the other two piles. 

P(E 4 \E l E 2 E 3 )= ^ 

So 

P{each pile has an ace} = 



/x PfGeorge, not Bill} 

30. (a) p{George | exactly 1 hit} = --- 

P{ exactly 1} 

P{G, not B} 

P{G, not B} + P{B, not G} 


(b) P{G | hit} 


(0-4)(0.3) 

(0.4) (0.3) + (0.7) (0.6) 


_ 2 
~ 9 

P{G, hit} 

P{hit} 

P{G} 0.4 20 

P{hit} ~~ 1 - (0.3)(0.6) “ 44 


32. Let Ej — event person i selects own hat. 


P (no one selects hat) 


= 1 - P(E i U£ 2 U---U£„) 


= 1 - 


P{E h E i2 )+■■■+{-\) n+l P{E l E 2 

ii h<h 
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= 1 - E P(E ^ ~ E P ( E h E h) - E P(EnE i2 E h ) + ■■■ 

i\ i\<h i\ < 1*2 <*3 

+ (-l ) n P(E l E 2 ---E n ) 

Let k e {1,2, ..., n). P(Ei t Ei^Ei^ = number of ways k specific men can select 
own hats -t- total number of ways hats can be arranged = (n — k)\/n\. Number of 
terms in summation E; 1< ! 2< — = num ber of ways to choose k variables out of 

n variables = ('/) = n\/k\(n — k)\. Thus, 


E PiE n E,2' ■ ■ E lk ) = E 


(n — k)l 


ni 


n <~<ik 

n \(n — k)\ 


nl 


1 

~k\ 


1 1 1 

F(no one selects own hat) = 1 —-1- 

1! 2! 3! 


= + (-l)"- 

2! 3! n\ 

40. (a) F — event fair coin flipped; U — event two-headed coin flipped. 

P(H\F)P(F) 


+ (-!)"- 
nl 

1 


P{F\H) = 


P(H\F)P(F) 


1 l 

2 ‘ 2 


1 1 
2 ‘ 2 


1-2 


P(H\U)P(U) 
; _ 1 
f _ 3 


(b) 


P(F\HH) = 


P(HH\F)P(F) 


P(HH\F)P(F) + P(HH\U)P(U) 

l 



(c) 


P(F\HHT) = 


1 
5 

P(HHT\F)P(F) 


P(HHT\F)P(F) + P(HHT\U)P(U) 
P(HHT\F)P(F) 


= 1 


P(HHT\F)P(F) + 0 

since the fair coin is the only one that can show tails. 

43. Let B be the event that Flo has a blue eyed gene. Using that Jo and Joe both have 
one blue-eyed gene yields, upon letting X be the number of blue-eyed genes a 
daughter of possessed by a daughter of theirs, that 

P(B) = P{X= l|X<2) = ^ = 2/3 

Hence, with C being the event that Flo’s daughter is blue eyed, we obtain 
P(C) = P(CB) = P(B)P(C\B ) = 1/3 
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45. Let Bj = event ith ball is black; R, = event ith ball is red. 


P(.Bi\R 2 ) = 


_ PiR^BQPjBx) _ 

P(R 2 \B l )P{B l ) + PiRoJR^PiRi) 

r b 


b + r + c b + r 
r b r + c r 

b + r + c b + r b + r + c b + r 

rb 

rb + (r + c)r 
b 

b + r + c 


48. Let C be the event that the randomly chosen family owns a car, and let PI be the 
event that the randomly chosen family owns a house. 


P(CH c ) = P(C) - P(CH) = 0.6 - 0.2 = 0.4 


and 


P(C c H) = P(H ) - P{CH) = 0.3 - 0.2 = 0.1 


giving the result 


P(CH C ) + P(C c H) = 0.5 

Chapter 2 


4. (a) 1,2, 3, 4, 5, 6. 

(b) 1,2, 3, 4, 5, 6. 

(c) 2,3,..., 11, 12. 

(d) -5,4, ...,4, 5. 



16. 1 - (0.95) 5 “ - 52(0.95) 51 (0.05). 


18. (a) 


P{Xi= Xi ,i = ,r — \\X r = j) 

= P{X i =x i ,i = l,...,r-l,X r =j) 

P(X r = j ) 

= !/! /’l ■■■Pr' \P'- 

w^jy.p J ^ - Pr) n ~ j 
(n - j)\ y If Pi 




Pr 
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(b) The conditional distribution of X \,..., X r _ i given that X r = / is multinomial 
with parameters n — j, i = 1 ,,r — 1 . 

(c) The preceding is true because given that X r = j , each of the n — j trials that 
did not result in outcome r resulted in outcome i with probability , i = 
1 1 . 

23. In order for X to equal n, the first n — 1 flips must have r — 1 heads, and then the 
nth flip must land heads. By independence the desired probability is thus 


27. ,P{same number of heads} = P{A — B — i] 



Another argument is as follows: 

P{# heads of A = # heads of B} 

= P{# tails of A — # heads of B} since coin is fair 
= P{k — # heads of A — # heads of B} 

= P{k = total # heads} 

POO 

38. c = 2, P{X >2 } = J 2 e~ 2x dx = e“ 4 

47. Let X, be 1 if trial i is a success and 0 otherwise. 

(a) The largest value is 0.6. If X\ — X 2 = X 3 , then 

1.8 = £[X] = 3£[X!] = 3P{Xi = 1} 

and so P{X = 3} = P{X j = 1} = 0.6. That this is the largest value is seen 
by Markov’s inequality, which yields 


P{X ^ 3K E[X ]/3 = 0.6 
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(b) The smallest value is 0. To construct a probability scenario for which P{X = 
3} = 0, let U be a uniform random variable on (0, 1), and define 


1 , iff/ ^0.6 

0 , otherwise 

1 , iff/>0.4 

0 , otherwise 

1, if either U ^ 0.3 or U > 0.7 

0 , otherwise 


It is easy to see that 

P{X i = X 2 = *3 = 1} = 0 

48. If X is a nonnegative random variable, and g is a differentiable function with 
g(0) = 0, then 



Jo 

Let / be the probability density function of X. One way to prove the result is to 
integrate by parts (dv = g'(t)dt, u = P(X > t)) to obtain 


POO POO 

/ P(X > t)g'(t)dt = - °° + / g(t)f(t)dt = £[*(*)] 

Jo Jo 


Another way is to let I(t) be the indicator function for the event that X > t. Then, 

pX poo 

g(X) = / g\t)dt = / I (t)g'(t)dt 

Jo Jo 

Now take expectations of both sides to obtain the result. 

49. E[X 2 ~\ — (E[X]) 2 = Var(X) = E[{X — Zs[Z]) 2 ] > 0. There is equality when 
Var(A) = 0, that is, when X is constant. 

64. For the matching problem, letting X = X\ + ■ ■ ■ + X^, where 



if ith man selects his own hat 
otherwise 


we obtain 


N 


Var(X) = Var(X;) + 2 EE Cov(X,-, Xj) 


i =1 


Since P{Xj = 1} = 1 /N, we see 
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Also, 


Cov(X/, Xj) = E[XiXj] - E[Xi]E[Xj] 

Now, 

_ 1 , if the /th and /th men both select their own hats 

1 2 0 , otherwise 

and thus 


E[XiXj] = P{Xj = 1 ,Xj = 1} 

= P{Xi = 1 }P{Xj = 1| Xi = 1} 

1 1 


N N - 1 


Hence, 


Cov(X/, Xj) = 


1 


N(N-l) \N J N 2 (N - 1) 


1 


and 


N — 1 

Var(Z) = -+ 2 

N 

N - 1 1 


N 


1 


2 J N 2 (N - 1) 


N N 


= 1 


-2 


65. W ©PIO - /’ 2 ) 2 , where p 2 = e~ 2 2 2 / 2 \ = 2 e 

(b) e _4 4 6 /6! 

(c) e~ 2p 

66 . Letting B, be the event that Xj e ,4 ,■, i = I, we have 

n n 

P(B\ ■ ■ ■ B n ) = P(B\) J] P(B i \B l ■ • ■ Bi- 1 ) = P{Bx) ]“[ P(B,) 


1=2 


i '=2 


71. See Section 5.2.3 of Chapter 5. Another way is to use moment generating functions. 
The moment generating function of the sum of n independent exponentials with 
rate X is equal to the product of their moment generating functions. That is, it is 
[X/(X — t)] n . But this is precisely the moment generating function of a gamma with 
parameters n and /,. 

72. Let Xi, i ^ 1 be independent normal random variables with mean 100 and variance 
100 . 

(a) Because P(Xj < 115) = P{Z < 1.5) = .9332, where Z is a standard normal, 
the desired probability is 1 — (.9332) 5 . 
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(b) Because S = X[ + ■ ■ ■ + X$ is normal with mean 500 and variance 500 

( 30 \ 

P(S > 530) = P Z > —= 

V V500 / 

73. (a) P(Sjj) = 1/365 
(b) and (c): yes 

(d) no 

(e) The number of successes is approximately Poisson with mean (")/365. 
Consequently 

P(A) — P (0 successes) ~ 1)/730 

( f ) e -23x22/730 ^ 5 

74. E[e~“ x ] = e- un e~ x X n /n\ = e~ x (^~ u ) n /n\ = e x(e ~ U ~ l) 


80. Let Xj be Poisson with mean 1. Then 




k=0 


But for n large x i ~ n has approximately a normal distribution with mean 0, 
and so the result follows. 

85. (a) Using that Var^^L^ = 1 along with the formula for the variance of a sum gives 
Cov(X, Y) 

2 + 2 -—— > 0 

cr x <JY 

(b) Start with Var^^ — '+ 0, and proceed as in part (a). 

(c) Squaring both sides yields that the inequality is equivalent to 


Var(X + Y) ^ Var(X) + Var(T) + 2ox&y 


or, using the formula for the variance of a sum 
Cov(X, Y) ^ o x o Y 


which is part (b). 

86 . Let X, be the time it takes to process book i. With Z being a standard normal 

(a) P(YZx x i > 42 °) « P& > 1 ^= 2 ) 

(b) P(EH 1 *i < 24 0) * PiZ < ^ff 5 ) = P(Z > 2/3) 
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87. (a) p( Z 2 < X ) = P(-Vx < Z < V7> 

= F z (Jx) - F z (—y/x) 


Differentiating yields 

/ Z 2 (x) = ] ;X _1/2 [/ Z (Vx) + fz(-Vx)] = ~^ n X ~ X/2e ~ X/2 

(b) The sum of n independent gamma random variables with parameters (1/2, 1/2) 
is gamma with parameters ( n/2 , 1 /2). 

Chapter 3 


2. Intuitively it would seem that the first head would be equally likely to occur on any 
of trials l,... ,n — 1. That is, it is intuitive that 


1 

P{X i = i\X { +X 2 = n}= -, i = 1,.— 1 

n — 1 


Formally, 

P{X i =i\Xi + X 2 = n} 


P{X i = i, Xi+X 2 = n} 
P{X 1 +X 2 = n] 

P{X i = i, X 2 = n- i} 
P{Xi+X 2 = n\ 

p(l - p)‘- 1 p(l - p)"-*- 1 

{ n \ X )pV-p) n - 2 P 


n — 1 


In the preceding, the next to last equality uses the independence of X\ and X 2 to 
evaluate the numerator and the fact that X\ + X 2 has a negative binomial distribution 
to evaluate the denominator. 


6 . 


Px\y(1 I 3) = 


P{X = 1, Y = 3} 

P{Y = 3} 

P{ 1 white, 3 black, 2 red} 


6 ! 

1!3!2! 


F{3 black} 

amr 


6 ! 

3!3! 




' 6 _ 

14 

3 


Pxiy(0 | 3) = 


4 

9 

8 

27 


2 
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Px\rV 3) - 
Px\y{ 3|3) = 1 
E[X\Y = 1] = | 


13. The conditional density of X given that X > 1 is 


fx\x>i(X) = 


fix) 


Xe 


—Xx 


P{X >1} e 


-k 


when x > 1 


E[X \X > l] = e' 


i: 


xXe Xx dx = 1+1 /X 


by integration by parts. This latter result also follows immediately by the lack of 
memory property of the exponential. 


19. 


J E[X | Y = y]fviy)dy = JJ xf X \Yix\y) dxf Y iy) dy 

fix, y) 


-Si- 


dxfyiy)dy 


friy) 

fix , y) dy dx 

= / xfxix) dx 
= E[X] 


23. Let X denote the first time a head appears. Let us obtain an equation for E[N \ X] 
by conditioning on the next two flips after X. This gives 

E[N | X] = E[N | X, h , h]p 2 + E[N \ X, h, t]pq + E[N \ X , f, h]pq 
+E[N | X, t, t]q 2 

where q = 1 — p. Now 

E[N \X, h,h] = X+ 1, E[N \ X, h, t] = X + 1 

E[N | X, t, h] = X + 2, E[N | X, t, t] = X + 2 + £[fV] 

Substituting back gives 

L[iV|Z] = (X + 1 ){p 2 + pq) + (X + 2)pq + (X + 2+ E[N]) q 2 

Taking expectations, and using the fact that X is geometric with mean 1 /p, we 
obtain 


LIW] = 1 + p + q + 2 pq + q 2 / p + 2 q 2 + q 2 E[N] 
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Solving for ZsfiV] yields 



1 — q 2 


38. E[X] = E[E[X\Y]\ = E[Y/ 2] = 1/4 

Var(X) = £[Var(X|F)] + Var(£[X|T]) 

= £[F 2 /12] + Var(T/2) 

= 1/36+ 1/48 = 1/12 

Suppose Y is uniformly distributed on (0, 1), and that the conditional distribution 
of X given that Y = y is uniform on (0, y). Find E[X] and Var(X). 

41. Condition on whether any of the workers is eligible and then use symmetry. This 
gives 

P(l) = P(l|someone is eligible) P (someone is eligible) = —[1 — (1 — /?)"] 


n 


42. (a) 



Thus, with a 2 = 


E[e' x2 ] = _ e [ exp{ — (x 2 — 2a 2 ixx)/2a 2 }dx 


~J2tc J—oo 


Using that 


x 2 — 2 a 2 /XX = (x — o 2 \jl) 2 — ii 2 cr 4 


we have 


E[e tx2 ] 




exp{— {x — o 2 \i) 2 !2a 2 \dx 




= (1 -2f)“ 1/2 exp^ 

1 /O 

= (1 -2 t)- l/2 e tt 


ex p|-( 1 -T327> 2 / 2 | 
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(b) 


ex p f 


i=i 


i=1 

= (1 - 2? )“" /2 exp { J2 A 


— (1 -2t)~ n/2 = «(1 - 2r)“" /2 “ 1 
dt 

d 2 

-^-(1 - 2 t)~ n/2 = 2n(n/2 + 1)(1 - 2f)“" /2 “ 2 
dt z 

Hence, if X 2 is chi-squared with n degrees of freedom then evaluating the 
preceding at t = 0 gives 


' [x,?] = n Var(x 2 ) = n 2 + 2 n 


n~ = 2 n 


(d) Conditioning on K yields 

oo 

E \ e ' W ] = [ e ' W \ K = k^e~ e/1 (d/2) k /k\ 


k=0 
oo 


= (1 - 2 t)~ (n+2k)/2 e~ e/2 (0/2) k /k\ 

k=0 

oo 

= (1 - 2 t)~ n/2 e~ e/1 (i - 2 t)~ k (0/2) k /k\ 
k=0 
oo 

= (1 - 2f)“" /2 e“ e/2 
= (1 -2f)“" /2 exp 


*=o 

e 


2 ( 1-2 0 


k\ 


= (1 - 2 t)~ n/2 


exp 


2 2(1 - 2 1 ) 
W 


1 — 2r 


Because the preceding is the moment generating function of a noncentral chi- 
squared random variable with parameters n and 0 , and the moment generating 
function uniquely determines the distribution, the result is proven. 

(e) From the preceding, we have 

E[W\K = k] = E[ X 2 +2k \ = n + 2k 


Var(W|tf = k) = Var( X/ f +2 *) = 2 n + 4 k 


Hence, 


E[W] = E[E[W\K]\ = E[n + 2 K] = n + 2 E[K] =n+6 
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and the conditional variance formula yields 

Var(W) = E[2n + 4 K] + Var (n + 2K ) = In + 26 + 20 = 2 n + 46 
43. With / = I{Y e A} 

E[XI] = E[XI\I = \]P(I = \) + E[XI\I = 0]P{I =0) = E[X\I = \]P(I = \) 
47. £[X 2 T 2 |X] = X 2 E[Y 2 \X] 

^ x 2 (£[r|x]) 2 = x 2 

The inequality follows since for any random variable U, E[U 2 ] > (E[U ]) 2 and 
this remains true when conditioning on some other random variable X. Taking 
expectations of the preceding shows that 

E[(XY) 2 ] > E[X 2 ] 


As 


E[XY] = E[E[XY\X]\ = E[XE[Y\X]\ = E[X] 


the results follow. 


53. P{X = n} = 


L 

-L 
-L 

-o~-m 


P{X = n\k}e~ A dX 

00 e~ x \ n 

- e x dX 

n\ 

°V 2 A r— 

n\ 


n -\-1 


The results follow since / 0 °° e 't n dt = T(« + 1) = n\ 

54. (a) P k = p k ~ l , which follows because X = k if the first success is immediately 
followed by k — 1 additional consecutive successes. 

(b) P„ = 0 if n < k. 


k 

Pk = p‘~ l ( l - p) p k~i+ 1 + p k 

i=i 

k 

Pn = ^2 P‘~ l V _ P) P n-i + \, n > k 
1 = 1 


(c) P k = p°( 1 - + p k 
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(d) P 3 = ,6 2 = .36 

P 4 = ,4P 4 + ,24P 3 =>• P 4 = .144 

P 5 = AP 5 + .24 P 4 + .144P 3 P 5 = .144 

P 6 = .4P 6 + .24P 5 + .144 P 4 P 6 = .64(.144) 

P 7 = .4P-] + .24 P 6 + .144 P 5 => P 7 = ,4(. 144) 

P 8 = ,4P 8 + .24P 7 + ,144P 6 Pg = (1.96)(. 144)(. 16) = 0.0451584 


55. Conditioning on the result of the trial following the first time that there have been 
k — 1 successes in a row gives 

M k = Mfc-i + p(l) + (1 - p)M k 

Hence, M k = 1 + --Mk- 1 , yielding that, with a = 1/p 

M k — 1 + a( 1 + aMk-i) 

= 1 + a + orMk -2 
= 1 + a + or + or* M k - 3 
k- 1 

= £«' 
i =0 


58. (a) r/A; 

(b) £[Var(A^|F)] + Var(P[jV|T]) = £[F] + Var(7) = £ + jj 

(c) With p = 


P(N = n) = / P(N = n\Y = y)f Y (y)dy 


I 

i 


v y n Xe- x y(Xy) 


r -1 


n! (r — 1 )! 




A'' 


«!(r — 1 ) 


■J 


e -(X+\)y y n+r-l dy 


X r 


n\{r - 1 )!(A + l) n+r 
V (n + r — 1 )! 
n\(r - 1 )!(A + 1 )"+'' 
n + r — 1 ' 
r - 1 


/ 


e- x x n+r ~ 1 


c/x 


P r d-P)" 


(d) The total number of failures before the rth success when each trial is indepen¬ 
dently a success with probability p is distributed asl-r where X, equal to 
the number of trials until the rth success, is negative binomial. Hence, 

P(X - r = n) = P(X = n + r) = ' j p' ( 1 - p) n 
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60 . 


(a) Intuitive that f(p) is increasing in p, since the larger p is the greater is the 
advantage of going first. 

(b) 1. 

(c) \ since the advantage of going first becomes nil. 

(d) Condition on the outcome of the first flip: 

f(p ) = P{I wins | h}p + P{I wins | f}(l — p) 

= /> + [!- f{pm-p) 


Therefore, 


f(p) = 



P 


67. Part (a) is proven by noting that a run of j successive heads can occur within the 
first n flips in two mutually exclusive ways. Either there is a run of j successive 
heads within the first n — 1 flips; or there is no run of j successive heads within the 
first n — j — 1 flips, flip n — j is not a head, and flips n — j + 1 through n are all 
heads. 

Let A be the event that a run of j successive heads occurs within the first 
n, (n ^ j), flips. Conditioning on X, the trial number of the first non-head, gives 
the following 

Pj(n)= P{ A \ X = k)p k ~ l {\ — p) 
k 

j OO 

= Y j P{A\X = k)p k ~\\-p)+ P{A\X = k)p k ~\\-p) 

k=l k=j+\ 

j oo 

= 'Y^Pj{n-k)p k ~ l {\ -p)+ ^2 / _1 (1-P) 
i=l k=j+ 1 

j 

= X! p j( n ~ k) p k ~ l ( { - p) + p 1 

i =1 

73. Condition on the value of the sum prior to going over 100. In all cases the most 
likely value is 101. (For instance, if this sum is 98 then the final sum is equally 
likely to be either 101, 102, 103, or 104. If the sum prior to going over is 95, then 
the final sum is 101 with certainty.) 

84. Suppose in Example 3.32 that a point is only won if the winner of the rally was the 
server of that rally. 

(a) If A is currently serving, what is the probability that A wins the next point? 

(b) Explain how to obtain the final score probabilities. 

93. (a) By symmetry, for any value of (T \,..., T m ), the random vector (/],..., I m ) 
is equally likely to be any of the m ! permutations. 
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(b) E[N] = Y E[N\X = i]P{X = i) 

i=i 
1 m 

= -TE[N\X = i] 
m L —' 

i=l 

= ^ ( E[Ti] + £[A?] ) + E ^ T m-l]\ 


where the final equality used the independence of X and 7). Therefore, 

m—1 

E[N] = E[T m _ 1 ] + Y E [ T i~\ 


1=1 


(c) E[Ti] = Y 


“ m + l — j 


j 

m —1 


(d) e[N] = Y 


m—1 i 

EE 


m + 1 — j m + 1 — / 

./ = ! ^ J , = 1 7=1 ^ 7 


m-1 


= E 


^ m + 1 - j 


j 

m— 1 


m— 1m—1 

■EE 

7=1 1=7 
m— 1 


= E 


E 


7« + 1 — j 

m(m — j) 


—Jm+l-; “ m +1 -j 


j 

m— 1 


= E 

7=1 


m(m — /) 


,m + 1 — j m + 1 — j 
= m(m — 1) 


97. Let X be geometric with parameter p. To compute Var (X), we will use the condi¬ 
tional variance formula, conditioning on the outcome of the first trial. Let I equal 
1 if the first trial is a success, and let it equal 0 otherwise. If I = 1, then X = 1; 
since the variance of a constant is 0, this gives 

Var(X|7 = 1) = 0 


On the other hand, if I = 0 then the conditional distribution of X given that 1 = 0 
is the same as the unconditional distribution of 1 (the first trial) plus a geometric 
with parameter p (the number of additional trials needed for a success). Therefore, 

Var(X|/ = 0) = Var(X) 

yielding 

E[Var(X\I)] = Var(X\I = 1)P(7= l) + Var(X|/ = 0)7(7 = 0) = (1 — p)Var(X) 
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Similarly, 

E[X\I = 1] = 1, E[X\I = 0] = 1 + £[X] = 1 + ! 

P 

which can be written as 

E[X\I]= 1 + 1(1-/) 

P 

yielding 

Var(£[X|/]) = ^Var(/) = \p(l - p) = 

p A p A p 

The conditional variance formula now gives 

Var(X) = £[Var(X|/)] + Var(£[X|/]) 

= (1 - p)Var(X) + — P - 
P 

or 

1 — p 

Var(X) = 

P 

98. E[NS] = E[E[NS\N]] = E[NE[S\N]] = E[N 2 E[X]\ = E[X]E[N 2 ]. Hence, 

Cov(N, S) = E[X]E[N 2 ] - (£[A^]) 2 £[X] = £[X]Var(W) 

99. (a) p k , 

(b) In order for N — k + r the pattern must not have occurred in the first r — 1 
trials, trial r must be a failure, and trials r + 1 ,..., r + k must all be successes. 

00 00 

(c) 1 — P(N = k) = ^ P(N — k + r ) = ^ P(N > r — 1 )qp k = E[N]qp k 

r= 1 r= 1 

Chapter 4 

1 . Pqj = 1 , Pio = g, Pl\ = g, P 32 — 1 
/’ll = 5 , ^ 22=5 

P\2= 5 , Pl3 = 5 

4. Let the state space be S = {0, 1, 2, 0, 1, 2}, where state i(i) signifies that the 
present value is i, and the present day is even (odd). 

9. P 0 I0 3 = .5078. 

In a sequence of independent flips of a fair coin that comes up heads with proba¬ 
bility .6, what is the probability that there is a run of 3 consecutive heads within 
the first 10 flips? 
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16. If Pjj were (strictly) positive, then would be 0 for all n (otherwise, i and j 
would communicate). But then the process, starting in i , has a positive probability 
of at least Pij of never returning to i. This contradicts the recurrence of i. Hence 
Pij = 0. 

21. The transition probabilities are 


_ 1 — 3a, if j = i 

|.. if Ui 

By symmetry, 

Plj = \^-P^ j*i 

So, let us prove by induction that 

pn = fa + |d - 4 «)" Xj = i 

iJ U-i(l-4a)» if j?i 


As the preceding is true for n = 1, assume it for n. To complete the induction 
proof, we need to show that 


pfi+i 

i,j 


11 + 1(1 — 4a)" +1 

jl _ l(l_4a)"+' 


if j = i 

if j ^ i 


Now, 


pn +1 

r i,i 


PuPu + T, p "j p j'i 

j¥=i 

(5 + |o - 4«)") ( 1 - 3«) + 3(1 


1 

4 

1 

4 


3 

+ — (1 — 4a)” (1 — 3a — a) 
+ ^(1 -4a) n+1 


1 

4 


(1 — 4a)" 


a 


By symmetry, for j / i 

P n+l = -(1 - P n+l ) = - - -(1 - 4a) ,,+1 
•J 3 " 4 4 


and the induction is complete. 

By letting n —> 00 in the preceding, or by using that the transition probability 
matrix is doubly stochastic, or by just using a symmetry argument, we obtain that 
m = 1/4,1 = 1,2, 3,4. 
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27. (a) It is a Markov chain because each individual’s state the next period depends 
only on its current state and not on any information about earlier times. 

(b) If i of the N individuals are currently active, then the number of actives in the 
next period is the sum of two independent random variables; the number 
of the i currently active who remain active in the next period; and l >,, the 
number of the N — i inactives who become active in the next period. Because 
Ri is binomial (i, a), and B, is binomial (N — i, ft), where b = 1 — /3, we see 
that 


E[X n |X„_i = i] = ia + (N-i)( \-p) = N(l-0) + (a + p- l)i 
Hence, 

E[X n \X n -i\ = N(l — + (a + — l)X„_i 

giving that 

E[X n ] = N(l - P) + (a + p - Y)E[X n -\\ 

Letting a = N( 1 — P),b = a + ft — 1, the preceding gives 
E[Xn\ = a + bE[X n —\\ 

= a + b{a + bE[X n - 2 ]) = a + ba + b 2 E[X n - 2 ] 

— u ~\~ ba b~a ~\- b~ E[X n — 3 ] 

Continuing this, we arrive at 

E[X n ] = a (l + b + ■ ■ ■ + ft"” 1 ) + b n E [X 0 ] 


Thus, 



Note that 



(c) With Rj , Bi as previously defined 


Pij = P{Ri + Bi = j ) 



k 



( 1 -/ 6 )' 



where (™) = 0 if r < 0 or r > m. 
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(d) Suppose N = l. Then, with 1 standing for active and 0 for inactive, the 
limiting probabilities are such that 


tto = Kofi + jri (1 — a) 
JTi = 7To (1 - P) + Jt\Ot 
770 + 7t\ — 1 

Solving yields 


7Tl 


1-/3 _ 1 -a 

2-a-/3’ ^ ~ 2 — a — /3 


Now consider the case of population size /V. Because each member will, in 
steady state, be active with probability n\ and because each of the members 
changes states independently of each other it follows that the steady state 
number of actives has a binomial (N, ni) distribution. Hence, the long-run 
proportion of time that exactly j people are active is 


JXj(N) = 


N 


2 — a — , 


1 — a 
2 - a - , 


N-j 


Note that the steady state expected number of actives is N 2 - a -p , m accord 
with what we saw in part (b). 

32. With the state being the number of on switches this is a three-state Markov chain. 
The equations for the long-run proportions are 

9 1 1 

7T0 — — JtQ + ~Tt\ + —JI2, 

16 4 16 

3 1 3 

^1 = + o Jt 2, 

o Z o 

7TQ + JTi + 7T2 = 1 


This gives the solution 

2 3 2 

*o = ni = 7t 2 = - 

38. (a) .4p+.6p 2 ; 

(b) P 2 2 j + 2P 2 2 = .32 + 1.36 = 1.68, 

(c) 71 1 p + 7T 2 p 2 = (1/3)p + (2/3) p 2 

Capa plays either one or two chess games every day, with the number of games that 
she plays on successive days being a Markov chain with transition probabilities 

P\,l = .2, Pi ,2 = -8 P 2 ,\ = .4, P 2 ,2 = -6 

Capa wins each game with probability p. Suppose she plays 2 games on Monday. 
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(a) What is the probability that she wins all the games she plays on Tuesday? 

(b) What is the expected number of games that she plays on Wednesday? 

(c) In the long run, on what proportion of days does Capa win all her games. 

39 . (a) Follows by symmetry because in any state there are an infinite number of 
states that are smaller and an infinite number that are larger and at each stage 
one is equally likely to go either to the next higher or next lower state. 

(b) JT rci = JT tt\ which is 0 if n\ =0 or is oo if tc\ > 0 . 

(c) Because there is no solution of 7 r; = 1, we can conclude that 7 T; = tt\ — 0 
and so the chain is null recurrent. 

The chain is doubly stochastic and so rc\ = 1/12. Hence, l/jr, = 12. 
j -1 j -1 

ej = P (enters j directly from i) = e, P, j 

i= 0 i=0 

ei = 1/3 

e 2 = 1/3 + l/3(l/3) = 4/9 
e 3 = 1/3 + l/3(l/3) + 4/9(l/3) = 16/27 
e 4 = l/3(l/3) + 4/9(1 /3) + 16/27(1/3) = 37/81 
e 5 = 4/9(1 /3) + 16/27(1/3) + 37/81(1/3) = 158/243 

47. {Y„,n ^ 1} is a Markov chain with states ( i, j). 


40. 

41. 




0 , if j ytk 

Pjl , if j=k 


where Pji is the transition probability for {X,,}. 


lim P{Y n = O', j)} = lim P{X n = z, X n+ \ = j] 

n—> oo n 

= lim[P{X„ = i}Pij] 

n 

= n i P‘j 

60. (a) Let Pj be the probability that state 3 is entered before state 4 given the initial 
state is i, i = 1,2. Then, conditioning yields 


Pi = A Pi + .3 P 2 + .2 
P 2 = .2 Pi + .2 P 2 + .2 


yielding that Pi = 11 / 21 . 

(b) Letting in, denote the mean number of transitions until either state 3 or state 4 
is entered, starting in state i. Then 

m i = l + .4/71 1 + .3//Z2 
777 2 = 1 + .2m 1 + .2/772 


yielding that m\ = 55/21. 
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62. It is easy to verify that the stationary probabilities are 7 r,- = . Hence, the mean 

time to return to the initial position is n + 1 . 

68 ' (a) (b) J2 7X1 Q ‘j = J2 n j p ji = n i J2 p n = n i 

i i i 

(b) Whether persuing the sequence of states in the forward direction of time or 
in the reverse direction, the proportion of time the state is i will be the same. 

Chapter 5 

5. P(Y = n) = P(n - \ < X < n) = e - x<n ~ l) - e~ Xn = (e~ x ) n - l (\ - e~ x ) 

7. P{X y <X 2 |min(X 1 ,X 2 ) = f} 

_ P{X i < X 2 ,min(Xi,X 2 ) = t } 

P{min(Zi, X 2 ) = f} 

P{X x =t,X 2 >t] 

- P{X 1 =t,X 2 > t} + P{X 2 =t,X 1 > t} 

= fi(m-F 2 (t)] 

Mm - F 2 m + him - Fxim 

Dividing through by (I — F\ (t)]l 1 ~ F 2 (t )] yields the result. (Of course, f) and Fj 
are the density and distribution function of X i = 1, 2.) To make the preceding 
derivation rigorous, we should replace “= t” by e (t, t + e ) throughout and then 
let e —> 0 . 

8 . Exponential with rate X + (i. 


(a) E[MX\M = X] = E[M 2 \M = X] 
= E[M 2 ] 

2 


(X + /z) 2 

(b) By the memoryless property of exponentials, given that M = Y, X is distri¬ 
buted as M + X' where X' is an exponential with rate X that is independent 
of M. Therefore, 

E[MX\M — Y] — E[M(M + Z')] 

= E[M 2 ] + E[M]E[X'] 

2 1 


(c) E[MX] = E[MX\M = X] 


(x + h)~ 
x 


X + /x 


X(X + fi) 
E[MX\M = Y] 




X + /r 


2X ■ 


li 


X(X + /x) 2 
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Therefore, 


Cov(X, M) 


/ 

k(k + M)^ 


1 4 /r, \ '-rr _ ^b _ l ! h 

^ V-a+^b H-a+^b 

(b) Let F be the time of the first departure. Write F = T + A where T is the 
time of the first arrival and A is the additional time from then until the first 
departure. First take expectations and then condition on who arrives first to 
obtain 


£[F] = + 
k a + kb 


E[A\a ] 


X a 


k a + kb 


+ E[A\b] 


kb 

"F kb 


Now use 


E[A\a] 


1 kb 1 

Ma + kb Ma + kb Ma + M6 


and 


E[A\b]=—^— + 
Mb + k a 


k a _1_ 

Mb + kfl Ma + M* 


18. (a) 1 /( 2 /x). 

(b) 1 /(4/x 2 ), since the variance of an exponential is its mean squared. 

(c) and (d). By the lack of memory property of the exponential it follows that A, 
the amount by which X( 2 ) exceeds X(i), is exponentially distributed with rate 
/x and is independent of X(\ } . Therefore, 


™ = £ [X (1) + A]=i- + I 

Var(X (2) ) = Var(X (1) + A >=^2+^ = ^2 


19 Using that the winning time is exponential with rate r = X a + Xb, independent 
of who wins, gives that with X equal to amount that runner A wins 

E[X] = — —— R [ e~ at re~"dt = R —^-— 

k fl + kb J k a + kb r + a 

23. (a) 

(b) (Whenever battery 1 is in use and a failure occurs the probability is ^ 
that it is not battery 1 that has failed. 

(c) (i ) n ~ i+l ,i > 1. 

(d) T is the sum of n — 1 independent exponentials with rate 2/x (since each time 
a failure occurs the time until the next failure is exponential with rate 2/x). 

(e) Gamma with parameters n — 1 and 2/x. 
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34 fc) __ | _ M B _ H-A 

^ ^•+£*'A+A t J3 ^+^5 A.+MA+Mfl ^+MA 

tdi _ - _ - 

V ' A+Z^a+MB a +IIB 

35. The axioms are immediate. For instance. 


P(N s (t + h)-N s (t) = 1) = P(N(s + t + h)-N(s + t) = 1) = Xh+o(h) 


36. E[S(t)\N(t) = n] = sE 


= sE 


= sE 


N(t ) 

n*i*(o=* 

i=1 
« 

=« 

Li=l 


n*< 


L;=l J 
= s{E[X]) n 
= S{ 1//X)" 


Thus, 

£[5(0] = 5 J] <Mn) n e~ U {Xt) n /n\ 

n 

= se~ kt ^ (Xt/p) n /n\ 

n 

— se ~^t+xt/fj. 


By the same reasoning 

£[5 2 (0|A f (0 = n] = s 1 (E[X 1 ]) n = s 2 (2/p 2 ) n 

and 

£[5 2 (t)] = s 2 e~ Xt+2kt/ ^ 

40. The easiest way is to use Definition 5.3. It is easy to see that {N{t), t ^ 0} 
will also possess stationary and independent increments. Since the sum of two 
independent Poisson random variables is also Poisson, it follows that N(t) is a 
Poisson random variable with mean (/. i + X 2 )t. 

57. (a) e~ 2 . 

(b) 2 P.M. 

(c) 1 - 5e~ 4 . 

5 g (a) It has the distribution of /V(?) + 1 where N(t), t > 0 is a Poisson process 
with rate X. Hence, it is distributed as 1 plus a Poisson with mean Xt. 

(b) E[p N ] = J2T=0 P n+] e-' At (Xt) n /n\ = pe~ Xt( ' { -P ) 

(c) t + [ 

(d) pe-^-PXt + {) 
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60. (a) g. 

(b) §. 

64. (a) Since, given N(t), each arrival is uniformly distributed on (0, t) it follows 
that 

E[X\N(t)] = N(t) [ {t-s)—=N{t) t - 
J o ' 2 - 

(b) Let t/ 1 , U 2 , ■ ■ ■ be independent uniform (0, t) random variables. Then 


Var(V|V(f) = n) = Var 




.! = 1 


= n Var({/,•) = ri¬ 


te) By parts (a) and (b) and the conditional variance formula, 


Var(X) = Var 


N(t)t \ f Njt)t 2 

2 )' L 12 


Xtt 2 Xtt 2 Xt 3 
_ ~T~ + ~V2 ~ ~Y 

69. Poisson with mean X /,' +?0 F(y)dy + X / 0 r ° F(y)dy. 

*79. It is a nonhomogeneous Poisson process with intensity function p(t)X(t), t > 0. 
84. There is a record whose value is between t and t + dt if the first X larger than t 
lies between t and t +dt. From this we see that, independent of all record values 
less than f, there will be one between t and t + dt with probability X(t) dt where 
1(7) is the failure rate function given by 


m = 


fit) 

1 - F(t) 


Since the counting process of record values has, by the preceding, independent 
increments we can conclude (since there cannot be multiple record values because 
the X[ are continuous) that it is a nonhomogeneous Poisson process with intensity 
function X(t). When / is the exponential density, X(t) = X and so the counting 
process of record values becomes an ordinary Poisson process with rate X. 

91. To begin, note that 


P | Xl> Z! x '-J =P{X l >X 2 ]P{ X l -X 2 >X 3 \X l >X 2] 

xP{X l -X 2 -X 3 >X 4 \X l >X 2 + X 3 }--- 
xP{Xi -X 2 - X n -i >X n \X l >X 2 + --- + Z„_!} 



by lack of memory 
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Hence, 


l ; = 1 J 1 = 1 ;•/( ^ 


98. (a) Start with 

D(t + h) = D(t) + D(t , t + h ) 


where D(t,t + h) is the discounted value of claims that occur between times 
r and f + h. Take expectations, and then condition on X, the number of claims 
made between times t and t + h to obtain 


M(t + h ) = M(t) + E[D{t, t + h)\X = 1 ]Xh + o{h) 
= M(t ) + /xe _ “ r k/i + o{h ) 

(b) M(t + h) - M{t) _ a o(h ) 

---= lie X + —— 

h h 


and let h go to 0. 

(c) This is immediate upon integration. 

99. p(X > t\M\ =y) = exp{— f (X + ye~ as )ds} = expf-kt - ^(1 - e~ at )} 

Jo ' a 

Hence, 

P(X >t)= (°° exp {-At - -(1 - e~ a, )}f(y) dy 

Jo « 

where / is the density function of the value of a mark. 


Chapter 6 


2. Let N A {t) be the number of organisms in state A and let Nn(t) be the number of 
organisms in state B. Then [N A (t), Ng(t)} is a continuous-Markov chain with 


V{ n ,m) = an + /3m 


B\n,m }. \n+2.m — I} 


an 

an + Pm 
Pm 

an + Pm 


4. Let N(t ) denote the number of customers in the station at time t. Then {/V (r)} is a 
birth and death process with 


X n — Xa n , /X/i — /x 
7 . (a) Yes! 

(b) For n = (n\ ,n,-+i,..., «a-i) let 

5, (n) = (m-- - 1 , «;+i + 1 , ■ ■ ■, n k - 1 ), 


i = -2 
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Sk-i(n) = («i,...,«;, n;+t> ...,n k -\ - 1), 
^(n) = (n\ + 1 -- ni,n i+ \ -- n k -i). 

Then 


?n,s,(n) = mu, i = l,...,k-l 
c ln.So(n) — k 

11. (b) Follows from the hint about using the lack of memory property and the fact 

that Sj, the minimum of j — (i — 1) independent exponentials with rate X, is 
exponential with rate ( j — i — 1 ) 1 . 

(c) From parts (a) and (b) 

P{T X + • • • + Tj < t] = P\ max X t < t] = (1 - e~ l, ) j 

I I 

(d) With all probabilities conditional onX(O) = 1, 

P Xj (t) = P{X(t) = j} 

= P{X(t)^j}-P{X(t)^j+l] 

= P{T l + --- + Tj < t) - P{T { + • ■ ■ + Tj+i < t} 

(e) The sum of i independent geometries, each having parameter p — e~ lt , is a 
negative binomial with parameters i, p. The result follows since starting with 
an initial population of i is equivalent to having i independent Yule processes, 
each starting with a single individual. 

16. Let the state be 

2 : an acceptable molecule is attached 
0 : no molecule attached 
1 : an unacceptable molecule is attached. 

Then, this is a birth and death process with balance equations 


ji l P l = 1(1 - a)Pg 

\x2P2 = XoiPq 

Since ^5 F, = 1, we get 


P2 = 


M 2 

Xa 


1 — a p<2 Xap, 1 

a /r 1 J lot/r,! + M1M2+1(1 — «)M2 


where P 2 is the percentage of time the site is occupied by an acceptable molecule. 
The percentage of time the site is occupied by an unacceptable molecule is 


P 1 


1 - a p.2 


Pi 


1(1 - a)p.2 


a p. 1 


Xap 1 + P 1 P 2 + 1(1 — a)p 2 
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19. There are four states. Let state 0 mean that no machines are down, state 1 that 
machine 1 is down and 2 is up, state 2 that machine 1 is up and 2 is down, and state 
3 that both machines are down. The balance equations are as follows: 

(Ai + A 2 ) A) = \i 1 P\ + M2 A 
(Ml + ^2)^1 = A) A 
(At + M2) A = A2A) + Ml A3 
Ml A = A2A + A1P2 

A + A + A + A = 1 


The equations are easily solved and the proportion of time machine 2 is down is 

A + A- 

24. We will let the state be the number of taxis waiting. Then, we get a birth and death 
process with A„ = 1, Mn = 2. This is an M/M/1. Therefore: 

1 1 

(a) Average number of taxis waiting = -= -= 1. 

M — A 2—1 

(b) The proportion of arriving customers that gets taxis is the proportion of arriv¬ 
ing customers that find at least one taxi waiting. The rate of arrival of such 
customers is 2(1 — Pq). The proportion of such arrivals is therefore 


2(1 - Po) 
2 


1 - A = 1 - 



A _ 1 

(JL 2 


28. Let Aw , vf denote the parameters of the X(t) and Pf\, vj of the Y(t) process; and 

IJ l IJ l 

let the limiting probabilities be A*, Pj, respectively. By independence we have 
that for the Markov chain (A'(f), Y (f)} its parameters are 


V(i,l) = v t + vj , 

1 / 

P (i,l)U,l ) = — J 


V- + Vj 


P x . 

in 


y ij 


Vj y 

P(i,l)(i,k) = —v P lk 


V? + V, 


y 1 Ik’ 


and 


lim P{(X(t), Y (t)) = ( 1 , /)} = P x Pj 

t-y-oo ■> 

Hence, we need to show that 


P i P l vX i P ij PjPfvjPji 


(That is, the rate from (i, l) to (j, I) equals the rate from (j, l) to (i, /).) But this 
follows from the fact that the rate from i to j in X (t) equals the rate from j to i ; 
that is, 


p x v x P x - = P x v x P x - 
1 1 'j j j jt 


The analysis is similar in looking at pairs ( i , l) and (i, k). 
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33. Suppose first that the waiting room is of infinite size. Let Xj(t) denote the num¬ 
ber of customers at server i, i = 1,2. Then since each of the M/M/1 processes 
{Xi(f)} is time reversible, it follows from Exercise 28 that the vector process 
{(Xi(f), t ^ 0} is a time reversible Markov chain. Now the process of 

interest is just the truncation of this vector process to the set of states A where 

A = {(0, m ): m ^ 4} U {(n, 0): n ^ 4} U {(n, m): nm > 0, n + m ^ 5} 
Hence, the probability that there are n with server 1 and m with server 2 is 



The constant C is determined from 
P — 1 

where the sum is over all («, m) in A. 

37. (a) The state is (n\, ..., nk) if there are n, type i patients in the hospital, for all 

i = 1,.... k. 

(b) It is a M/M/o o birth and death process, and thus time reversible. 

(c) Because Nj (t), t 0 are independent processes for i = 1 ,,k, the vector 
process is a time reversible^ continuous time Markov chain. 

(d) P(n u ...,n k ) = Yle- k ‘ /lA ‘(M/lM) n ‘/n i \ 

i=i 

(e) As a truncation of a time reversible continuous time Markov chain, it has 
stationary probabilites 

k 

P(n i, ..., n k ) = K ]~[ (kt/m) n ‘/n t \, (n i-- n k ) e A 

i=l 

where A = {(«i, ..., n k ) : Y^i=\ n i w i ^ C], and K is such that 

k 

(ni,...,nic)eA i =1 

(f) With fj equal to the rate at which type i patients are admitted, 

n = ^2, kiP(m, ...,n k ) 

(ni,...,n i + l,...,nj t )eA 

(g) E'=l n/\ E/=l x i 
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38. (a) The state is the set of idle servers. 

(b) For i e S, j <£ S. the infinitesimal rates of the chain are 

qs,s-t = V|S|, qs,s+j = Pj 

where |5| is the number of elements in S. The time reversibility equations are 
P(S)X/\S\ = P{s-i)m 
which has a solution 

F(S) = P 0 |S|!n(wA) 

keS 

where Po, the probability there are no idle servers, is found from 

Jwi + Eisnn^A)] = i 

S keS 

where the preceding sum is over all nonempty subsets of {1,..., n). 

39. (a) The state is (z'i , , i k ) if Oi, ..., i*} is the set of idle servers, with i \ having 

been idle the longest, F the second longest, and so on. 

(b) and 

(c) For j (f: {/ 1 ,..., 4}, the infinitesimal rates of the chain are 


q(ii,—,ik),(<i,-Jk-0 ~ qq i. ik)Ah,—,ikJ) ~ Pi 

The time reversibility equations are 


P(iu i k )X = P(;T,..., zT-i)F4 


giving the solution 

POi, ■ ■ ■, ik) = il " ' xk P(0) 

where P(0) is the probability there are no idle servers. 

40. The time reversible equations are 


P(i) 


Vi 

n — 1 


= P(I) 


v i 

n — 1 


yielding the solution 


P(j) = 


1 /Vf 

E"=i i hi 


Hence, the chain is time reversible with long run proportions given by the preceding. 

41. Show in Example 6.22 that the limiting probabilities satisfy Equations (6.33), 
(6.34), and (6.35). 
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42. Because the stationary departure process from an M/M/l queue is a Poisson pro¬ 
cess it follows that the number of customers with server 2 is the stationary proba¬ 
bility of an M/M/1 system. 

43. We make the conjecture that the reverse chain is a system of same type, except that 
the Poisson arrivals at rate X arrive at server 3, then go to server 2, then to server 
1, and then depart the system. Let be the 3-vector with 1 in position k and 0 
elsewhere. With the state being i = (i\, h, D when that there are i j customers at 
server j for j = 1, 2, 3, the instantaneous transition rates of the chain are 


— A, 

— ftt, i > 0 

q(i,j,k),(i,j-l,k+l) — P2, j > 0 

tf(i,j,k),(i,j,k— 1) = k > 0 

whereas the conjectured instantaneous rates for the reversed chain are 


( l(i,j,k),(i.j,k+ 1 ) = 1 

~ F-3, k > 0 

9(i,j,k),(i+l,j-l,k) = P2 > J > 0 
t}(i,j,k'),(i—l,j,k) — ft 1 j i > 0 


The conjecture is correct if we can find probabilities P(i, j, k) that satisfy the 
reverse time equations when the preceding are the instantaneous rates for the 
reversed chain, and it is easy to check that 


P{i,],k) = K 





k 


satisfy. 

49. (a) The matrix P* can be written as 

P* = I + R/u 

and so P*" can be obtained by taking the i, j element of (I + R/u)' 1 , which 
gives the result when v = n/t. 

(b) Uniformization shows that Pij(t) — E[P*j N ], where N is independent of the 
Markov chain with transition probabilities //* and is Poisson distributed with 
mean vt. Since a Poisson random variable with mean vt has standard deviation 
( vt ) 1 ' 2 , it follows that for large values of vt it should be near vt. (For instance, 
a Poisson random variable with mean 10 6 has standard deviation 10 3 and thus 
will, with high probability, be within 3000 of 10 6 .) Hence, since for fixed i 
and j, P*" 1 should not vary much for values of m about vt where vt is large, 
it follows that, for large vt, 

E[P* n ] « P* n 


where n = vt 
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Chapter 7 


1 -F(t-y), if y < f 
0, if y > t 


3. (a) P(N(t)=n\S n = y) = 

poo 

(b) P(N(t) =n)= / P(N(t) = n\S n = y)fs n (y)dy 
Jo 


= f e~ Kt ~ y) Xe~ Xy {Xy) n ~ l /(n-\)\dy 

Jo 

-Xt-\n ft 

= f y n ~ l dy 

(n - 1 )! Jo ■ 


-xtn t\n 


(Xt) n 


6. (a) Consider a Poisson process having rate X and say that an event of the renewal 

process occurs whenever one of the events numbered r, 2r, 3 r,... of the Pois¬ 
son process occurs. Then 


P{N(t) ^ «} = P{nr or more Poisson events by t } 

OO 

= J2 e ~*(xty/i\ 

i=nr 

OO OO OO 

(b) E[N{t)] = PWW > «} = E E e- u (kty/il 

n= 1 n =1 i=nr 

oo [i / r ] oo 

i=r n= 1 i=r 


8. (a) The number of replaced machines by time t constitutes a renewal process. 

The time between replacements equals T, if the lifetime of new machine is 
^ T\ x, if the lifetime of new machine is x, x < T. Hence, 


Zs[time between 


replacements] 




xf{x) dx + T[ 1 — F(T )] 


and the result follows by Proposition 3.1. 

(b) The number of machines that have failed in use by time t constitutes a renewal 
process. The mean time between in-use failures, E[F], can be calculated by 
conditioning on the lifetime of the initial machine as £’[T'] = E[E[F\ lifetime 
of initial machine]]. Now 


E[F\ lifetime of machine is x] 


x, if x ^ T 

T + E[F\, if jc > T 


E[F] = 



xf(x)dx + (T + E[F])[1- F(T)] 


Hence, 
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or 


2±[F] = 


xf(x)dx + T[l- F{T)] 
F(T ) 


and the result follows from Proposition 3.1. 

13. With Wi equal to your winnings in game i, i ^ 1, and N the number of games 
played, Wald’s equation yields 


E[X] = E[N]E[W] = 0 


With pi = P(X = i), pi = 1/2 + 1/8 = 5/8, p-\ = 1/4, p- 3 = 1/8, verifying 
that E[X] = 0. 

18. We can imagine that a renewal corresponds to a machine failure, and each time a 
new machine is put in use its life distribution will be exponential with rate with 
probability p, and exponential with rate /12 otherwise. Hence, if our state is the 
index of the exponential life distribution of the machine presently in use, then this 
is a two-state continuous-time Markov chain with intensity rates 

?l,2 = Mt(l - p), <72,1 = M2P 


Hence, 


Pn(t) = 


Ml (1 ~ P) 

Ml (1 - P) + M2 P 
M2 p 


ex p{ — [M1 (1 - P) + P2P]t] 


+ 


Ml(1 - P) + M2P 


with similar expressions for the other transition probabilities (Pn(t) = 1 — P\ 1 (t), 
and P 22 U) is the same with P 2 P and mi(1 — p) switching places). Conditioning 
on the initial machine now gives 


E[Y(t)] = P E[Y(t)\X(0) = 1] + (1 - P )E[Y(t)\X(0) = 2] 


Pn(t) | Pn(t) ' 
.Ml M2 J 

Finally, we can obtain m(t) from 


+ (1 - P) 


P 21 (0 , P 22 U) 


Ml 


M2 


H[m(t) + 1] = t + E[Y(t)] 


where 


M = P/Ml + (1 - P)/M 2 


is the mean interarrival time. 
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22 (a) Let X denote the length of time that J keeps a car. Let I equal 1 if there is a 

breakdown by time T and equal 0 otherwise. Then 


E[X] = E[X\I = 1](1 
1 

M 

1 - e~ XT 
= T+ - 


e~ XT ) + E[X\I = 0]e~ XT 


= I T+- 1(1 -e~ XT ) 


„-XT 


r + - le 


-XT 




X 


1 /E[X] is the rate that J buys a new car. 

(b) Let W equal to the total cost involved with purchasing a car. Then, with Y 
equal to the time of the first breakdown 


E[W] = 

= C 


poo 

= / E[W\Y = y]Xe~ Xy dy 

Jo 


C T f c 

/ r(l + p(T -y)+ 1 )Xe~ Xy dy + / 

-T 


rXe~ Xy dy 


= C + r(2-e~ XT ) + r 


J 


/x(T — y)Xe Xy dy 


J’s long run average cost is E[W]/E[X~\. 


23. (a) Say that a new cycle begins each time A wins a point. With N equal to the 

number of points in a cycle 

£[AT] = 1 +q a /p h 

where the preceding used that, starting with B serving, the number of points 
played until A wins a point is geometric with parameter pi,. Hence, by renewal 
reward, the proportion of points won by A is 1 /E[N] = 

(b) (p a + Pb)/ 2 

(c) — 

pi 

30. A(t) t — S N(t) 


(c) y b > (Pa + Pb)/ 2 is equivalent to p a q a > p h qb- 


= 1 - 


= 1 - 


Sn( t) 
t 

Sn( t) N(t) 
N(t ) t 


The result follows since .S' ; v(r) / N(t ) —>■ p (by the strong law of large numbers) and 
N(t)/t -> l/p. 

35. (a) We can view this as an M/G/oo system where a satellite launching corre¬ 

sponds to an arrival and F is the service distribution. Hence, 

P{X{t) — k] — e~ m [X(t)] k /k\ 

where X(t) — X /J (1 — F(s)) ds. 
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(b) By viewing the system as an alternating renewal process that is on when there 
is at least one satellite orbiting, we obtain 

1/A. 

lim P{X(t) = 0} = --- 

l/A + £[7] 

where T, the on time in a cycle, is the quantity of interest. From part (a) 
lim P{X(t) = 0} = e~^ 

where /i = / 0 °° (1 — F(s)) ds is the mean time that a satellite orbits. Hence, 


e~^ = 


1/A 


E[T] = 


1/A + E[T] 
1 - 


Ag-^/i 


42. (a) F e (x) = - 


e-y^dy= 1 


(b) 


7 Jo 

1 f x x 

F e (x) = - I dy = —, 0 ^ x ^ c. 

c Jo c 


(c) You will receive a ticket if, starting when you park, an official appears within 
one hour. From Example 7.23 the time until the official appears has the dis¬ 
tribution F e which, by part (a), is the uniform distribution on (0,2). Thus, the 
probability is equal to \. 

44. (a) Let /V, denote the number of passengers that get on bus i. If we interpret X, 
as the reward incurred at time i then we have a renewal reward process whose 

ith cycle is of length /V;, and has reward X'/v,-i_ k/v,-_i+i + • • • + Xn { -\ _f v,- • 

Hence, part (a) follows because N is the time and X \ + ■ ■ ■ + Y,v is the cost 
of the first cycle. 

(b) Condition on N(t ) and use that conditional on N(t) = n the n arrival times are 
independently and uniformly distributed on (0, t). As S = X\ + ■ ■ ■ + X v is 
the number of these n passengers whose waiting time is less than x, this gives 


£[5|r = r, N(t) = n] = 


nx/t, if x < t 
n , if x > t 


That is, £[517 = t, N (f)] = N(t) min (x , t)/t. Taking expectations yields 
£[517 = r] = A minfx, t) 

(c) From (b), £[S|7] = A min (x , 7) and (c) follows upon taking expectations. 

(d) This follows from parts (a) and (c) using that 


nOO 

^[minfx, 7)] = / 7(min(x, 7) > t)dt 

J 

along with the identity 7[/V] = A7[7’]. 


-r 

Jo 


7(7 > t)dt 
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(e) Because the waiting time for an arrival is the time until the next bus, the 
preceding result yields the PASTA result that the proportion of arrivals who 
see the excess life of the renewal process of bus arrivals to be less than x is 
equal to the proportion of time it is less than x. 

49. Think of each interarrival time as consisting of n independent phases—each of 
which is exponentially distributed with rate X — and consider the semi-Markov 
process whose state at any time is the phase of the present interarrival time. Hence, 
this semi-Markov process goes from state 1 to 2 to 3 ... to n to 1, and so on. Also 
the time spent in each state has the same distribution. Thus, clearly the limiting 
probability of this semi-Markov chain is P, = 1/n, i = 1,..., n. To compute 
lim P{Y(t) < a '), we condition on the phase at time t and note that if it is n — i + 1, 
which will be the case with probability 1/n, then the time until a renewal occurs 
will be sum of i exponential phases, which will thus have a gamma distribution 
with parameters i and X. 

52. (a) If T is exponential then E[T 2 ~\/E[T] = 2E[T]. Hence, X = XE[T] = 
E[N]. 

(b) Because we are averaging over all time, we are giving more weight to those 
cycles (times between bus arrivals) that are large. 

(c) Because buses arrive according to a Poisson process, the average number of 
waiting people seen by a bus must, by PASTA, equal to the average number 
waiting when averaged over all time. 


53. 


Letting Xj — 1 if flip i comes up heads, and 0 if it comes up tails, then we want 
E\ X, ], where N is the number of flips until the pattern appears. With q = 


1 ~P 


E[N] = 


PV 


pV 


+ 


pV 


E[Nhth] 

-4- + E[N h ] 

p q 

l l 

~ir ^— 
pq p 


Because A is a stopping time for the sequence Xj, i ^ 1, it follows from Wald’s 
equation that 


N 


.;=! J 


E[N]p = 


1 


pV 


i 

H-b 1 

pq 


Chapter 8 


2. This problem can be modeled by an M/M/1 queue in which X — 6, /x = 8. The 
average cost rate will be 

$10 per hour per machine x average number of broken machines 
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The average number of broken machines is just L, which can be computed from 
Equation (3.2): 


X 

L = - 

H — X 



Hence, the average cost rate = $30/hour. 

6. To compute W for the M/M/2, set up balance equations as follows: 


X I\) = /x P\ (each server has rate /i) 

( X + fj)P\ = XPq + 2fj.Pi 
(k + 2 fj)P n = XP n —\ + 2fiP n+ [ , n Js 2 

These have solutions P n = p n /2 n 1 !\) where p — X/p.. The boundary condition 
p n = 1 implies 

p _ 1 ~ P/2 _ (2 - p) 

0 1 + p/2 (2 + p) 

Now we have P n , so we can compute L, and hence W from L — X W: 

OO 00 _1 

L=J2„P n = pP 0 J2n(/) n ~ 

72=0 72=0 

72=0 

(2 — p) {p/2) 

= 2 -- (See derivation of Equation (8.7).) 

(2 + p) (1 — p/2) 2 

= Ap 
~ (2 + p)(2 — p) 

4 fiX 

(2 p. + X){2 p. — k) 

From L — X W we have 

W = W(M/M/2) = -—- 

{2fj + X){2fj - X) 

The M/M/ 1 queue with service rate 2 fj has 

1 

W(M/M/1) = - - 


from Equation (8.8). We assume that in the M/M /1 queue, 2fj > X so that the 
queue is stable. But then 4/j > 2 /j + X, or 4fj/{2fi + X) > 1, which implies 
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W (M/M/ 2) > W(M/M/ 1). The intuitive explanation is that if one finds the queue 
empty in the M/M /2 case, it would do no good to have two servers. One would 
be better off with one faster server. Now let Wq = Wq(M/M /l) and = 
Wq(M/M/ 2). Then, 


W { Q = W(M/M/Y) - 1/2/z 
W/j = VT(M/M/2) - 1/^t 

So, 

'hi; = X -— from Equation (8.8) 

^ z/x(z/x — a) 

and 

2 X 2 

^ /x(2/x — A.)(2/x + X) 

Then, 


, , 1 1 

W Q > K - > 


(2 At + A) 


X <2/i 


Since we assume X < 2/j. for stability in the M/M/1 case, < Wq whenever 
this comparison is possible, that is, whenever X < 2[l. 

7. (a) J2„nP n /X 


(b) e 0 = 1, e n - n"=o 


„_i n + ia 


At + (i + l)a 


, n > 0 


(c) P(n | served) = 


Pn 


A 

oo n — 1 j 

(d) /’(n I served) £ — , .. , ,, 

n =1 i=0 A 1 + ( ! + l)a 

(e) Let Wq (,v ) and VV/; (n ) be the averages of the time spent in queue by those 
that are served and by those that are not. Then 

Y J (n-Y)P n /X= Wq =^e,P,ff fi (i)+ 1 ~Y, e * P A W Q (n) 

n> 0 n \ n / 


where Wq ( 5 ) is given in part (d), and where the right hand equality is obtained 
by conditioning on whether an arrival was served. 

11. Let the state be ( n , in) if there are n families and m taxis waiting, nm = 0. The 
time reversibility equations are 


Pn-1 ,0^ = Pn.Of'■l, n — l, . . . , N 
Po,m— lM = Po,mX, III = 1, . . . , M 
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Solving yields 

Pn,0 = Wn) n Po,0, n = 0, l,N 
Po.m = OA) m P 0 , 0 , m = 0, l,..., M 


where 


1 

Po.o 


N M 

£(W + X>A) m 

n =0 m =1 


( a ) Em=0 P 0,m 

(b) El 0 Pn, 0 


(C) 


XEi=0 ^0 

■^■(1 — •f’jv.o) 


(d) 

(e) 


Eai=0 '"Po.m 

1 — Pn ,0 When N = M = oo the time reversibility equations become 


Pn-1,0^ = Pn. o(M + na), n ^ 1 
■Po.m-IAI = Po.m + W/6), m > 1 


which yields 


/=1 


n 1 

/u + i'a ’ 

m 

%n 

i= 1 


777 ^ 1 

X + if) 


The rest is similar to the preceding. 

13. (a) XP o = P p i 

(X + /x)Pj = APo + 2fiP2 
(X + 2/z)P„ = XP n -1 + 2/rP„ + i, n ^ 2 


These are the same balance equations as for the M/M/2 queue and have 
solution 


Po 


/ 2fi-X \ 

\2fi + x) ’ 


Pn = 


x n 

2 n ~ l n n 


Po 


(b) The system goes from 0 to 1 at rate 


XP 0 


X(2/x - X) 
(2/r + X ) 


The system goes from 2 to 1 at rate 


2 /u,P 2 = 


l 2 {2/i — X) 
li (2jx + X) 
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(c) Introduce a new state cl to indicate that the stock clerk is checking by himself. 
The balance equation for P c i is 

(k + n)P d = jlP 2 


Hence, 

a X 2 (2 n — X) 

Pel = -t—Pi = -—-- 

X + i± 2n(X + /x) (2/x + k) 

Finally, the proportion of time the stock clerk is checking is 


OO 

Pci + P n = Pci + 

n=2 


2X 2 

H( 2 fj, — A.) 


21. (a) AiFio- 

(b) X 2 (P 0 + Pio). 

(c) AiFio/[AiPio + X 2 (Pq + Ao)]- 

(d) This is equal to the fraction of server 2’s customers that are type 1 multiplied 
by the proportion of time server 2 is busy. (This is true since the amount 
of time server 2 spends with a customer does not depend on which type of 
customer it is.) By (c) the answer is thus 

(Pot + PiOAiFio 
A-tPio + ^ 2 (Pq + Pio) 


24. The states are now n,n Jj 0, and n', n Js I where the state is n when there are 
n in the system and no breakdown, and n when there are n in the system and a 
breakdown is in progress. The balance equations are 


XPq = fiP\ 

(X + fi + a)P n = XP n - 1 + iiP n+l + PP H ', n ^ 1 
(P + X)P\t = aP\ 

(P + X)P n r = aP n + XP( n _iy , n ^ 2 

OO OO 

r. p " Pn ' =1 

n =0 n =1 

In terms of the solution to the preceding, 


OO 

L = y ' n(P n + P n ,) 

n =1 



L 

X 


and so 
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28. If a customer leaves the system busy, the time until the next departure is the time of 
a service. If a customer leaves the system empty, the time until the next departure 
is the time until an arrival plus the time of a service. 

Using moment generating functions we get 


E\e sD \ = 


= —Zife^lsystem left busy} 


1 — — 1 E{e sD | system left empty} 
/V 


P 


fl — S 


1 - - ) E{e s(X+Y 
/V 


where X has the distribution of interarrival times, Y has the distribution of service 
times, and X and Y are independent. Then 


E[e s(x+Y) ] = E[e sX e sY ] 

= by independence 





(*-s) 

By the uniqueness of generating functions, it follows that D has an exponential 
distribution with parameter X. 

36. The distributions of the queue size and busy period are the same for all three 
disciplines; that of the waiting time is different. However, the means are identical. 
This can be seen by using W = L/X, since L is the same for all. The smallest 
variance in the waiting time occurs under first-come, first-served and the largest 
under last-come, first-served. 

39. (a) a o = Pq due to Poisson arrivals. Assuming that each customer pays 1 per unit 
time while in service the cost identity of Equation (8.1) states that 


average number in service = XE[S] 


or 


1 - P 0 = XE[S] 

(b) Since gq is the proportion of arrivals that have service distribution G i and I —ay 
the proportion having service distribution G 2 , the result follows. 
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(c) We have 


Po = 


E[I] 


E[I] + E[B] 
and E[I] = l/X and thus, 
1-Po 


E[B} = 


XP 0 

E[S] 


1 -XE[S] 

Now from parts (a) and (b) we have 

E[S] = (1 - XE[S])E[S x ] + X£[S]£[S 2 ] 


or 


E[S] = 


E[S i] 

l+XE[Si] + XE[S 2 ] 


Substituting into E[B] = E[S]/( \ — >./i [ ,S’]) now yields the result, 
(d) ciq = 1/£[C], implying that 


E[C] = 


E[Si]+ l/k-E[S 2 ] 

l/x - £[S 2 ] 


45. By regarding any breakdowns that occur during a service as being part of that 
service, we see that this is an M/G/l model. We need to calculate the first two 
moments of a service time. Now the time of a service is the time T until something 
happens (either a service completion or a breakdown) plus any additional time A. 
Thus, 


E[S] = E[T + A] 

= E[T] + E[A] 

To compute E[A], we condition upon whether the happening is a service or a 
breakdown. This gives 


E[A\ = £[A|service] 




fi + a 


^[Albreakdown] 


jji + a 


= Zs[A|breakdown] 


a 


M + a 
a 


= (\ + EIS] 

\p J ii +a 

Since E[T] = 1 /(or + p) we obtain 


E[S] = 


1 


a + fi 


]s +E[S] 


fi + a 
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or 


E[S] = - 
M 


a 

M/3 


We also need E[S 2 ], which is obtained as follows: 

E[S 2 ] = E[(T + A ) 2 ] 

= E[T 2 ] +2E[AT] + E[A 2 ] 

= E[T 2 ] + 2E[A]E[T] + E[A 2 ] 


The independence of A and T follows because the time of the first happening is 
independent of whether the happening was a service or a breakdown. Now, 

E[A 2 ] = £'[A 2 |breakdown] ——— 

fj. + a 

= -£[ (downtime + S*) 2 ] 

M + « 

= —'— }£[down 2 ] + 2£[down]£[5] + £[S 2 ]! 

/x + a 

+ £[S 2 ] 

Hence, 


a 

2 2 

1 a 

/x + a 

P 2 P 

u uP 


E[S 2 ] = 


(/x + P) 2 
a 


Pi/x + a) /x + i 


/x 


1 

L/x 


a 

liP 


£[S 2 ] 


^P J. 


Now solve for £[S 2 ]. The desired answer is 


W 2 


XE[S 2 ] 
2(1 -XE[S]) 


In the preceding, S* is the additional service needed after the breakdown is over 
and S* has the same distribution as S. The preceding also uses the fact that the 
expected square of an exponential is twice the square of its mean. 

Another way of calculating the moments of S is to use the representation 


N 

S — (T( + Bj) + Tpj+i 

i =1 


where N is the number of breakdowns while a customer is in service, 7/ is the time 
starting when service commences for the ith time until a happening occurs, and 
Bj is the length of the ith breakdown. We now use the fact that, given N, all of 
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the random variables in the representation are independent exponentials with the 
Tj having rate p + a and the /i, having rate p. This yields 


E[S\N] = 
Var(S\N) = 


N + 1 N 

- 1 ——, 

p + a p 

N+l N 
(/x + a) 2 + P 2 


Therefore, since 1 + N is geometric with mean (p + a)/p (and variance 
a (a + p)/p 2 ) we obtain 


E[S] = - 
M 


a 

HP 


and, using the conditional variance formula, 


Var (S) = 


1 

-h 

p + a 



a (a + fi) 




1 

/x(/x + a) 


a 

IJ.P 2 


47. The identity L = X a W gives that L = .8 • 4 = 3.2 

52. S n is the service time of the nth customer; T n is the time between the arrival of the 
nth and (n + l)st customer. 


Chapter 9 


4 (a) <p{x) — x\ max(*2, xj, X 4 )x$. 

(b) <j)(x) — xi max(x2.r4, X 3 Xs)x(,. 

(c) <j){x) — max(xi, X2X3)X4. 

6. A minimal cut set has to contain at least one component of each minimal path set. 

There are six minimal cut sets: {1, 5}, {1, 6}, {2, 5}, {2, 3, 6}, {3, 4, 6}, {4, 5}. 

12. The minimal path sets are {1, 4}, {1, 5}, {2, 4), {2, 5}, {3, 4}, {3, 5}. With 
qi = 1 — pi , the reliability function is 

r(p) = / J {either of 1, 2, or 3 works} P {either of 4 or 5 works} 

= (1 - <71<72<73XI - < 74 ^ 5 ) 

17. E[N 2 ] = E[N 2 \N > 0]P{7V > 0} 

> (E[N\N > 0]) 2 P{iV > 0}, since E[X 2 ] > (^[X]) 2 

Thus, 

E[N 2 ]P{N > 0} ^ (E[N\N > > 0]) 2 

= (E[N]) 2 

Let N denote the number of minimal path sets having all of its components func¬ 
tioning. Then r{p) = P{N > 0}. Similarly, if we define N as the number of 
minimal cut sets having all of its components failed, then 1 — rip) = P{N > 0}. 
In both cases we can compute expressions for £[ N] and E[N 2 ] by writing N as the 
sum of indicator (i.e., Bernoulli) random variables. Then we can use the inequality 
to derive bounds on r (p). 
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22 ' F,(a) = P{X > t +a\X > t] 


P[X > t + a] F(t + a ) 
P{X>t} = F(t) 


(b) Suppose k{t) is increasing. Recall that 


Fit) = e ~ti x(s)ds 


Hence, 


F(t + a) 



which decreases in t since X(t) is increasing. To go the other way, suppose 
F(t + a)/Fit) decreases in t. Now when a is small 


+ a ) ^ —aX(t) 

F(t) 


Hence, e aX(n must decrease in t and thus \(t) increases. 


25. For x ^ £, 

1 -p = F ($) = £(*(£/*)) > [F(x)f' x 

since IFRA. Hence, Fix) ^ (1 — p) x ^ = e~ 6x . 

For x < f, 

F(x) = FQixm > [Fi&Y* 

since IFRA. Hence, Fix) ^ (1 — p) x ^ = e~ 6x . 

30. rip) = PIP2P3 + PIP2P4 + P1P3P4 + P2P3P4 ~ ?>P\P2P3P4 

'2(1 - r) 2 (l - t/2) + 2(1 - 0(1 - t/2) 2 


ril-Fit)) = 


—3(1 — r) 2 (l — r/2) 2 , 

0 , 


0 ^ t < 1 
1 < t ^ 2 


£ [lifetime] = / 2(1 - t) 2 (l - t/2) + 2(1 - 0(1 - t/2) 2 


-3(1 - 0 2 (1 -t/2) 2 ]r/f 

31 

60 
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Chapter 10 


1 . B(s ) + B(t) = 2 B(s) + B(t ) — B(s). Now 2 B(s) is normal with mean 0 and 
variance As and Bit) — B(s) is normal with mean 0 and variance t — s. Because 
Bis) and Bit) — B(s) are independent, it follows that B(s) + Bit) is normal with 
mean 0 and variance 4s+ t — s — 3s + t. 

3- E[B(n)B(t 2 )B(t 3 )] = E[E[B(ti)B(t 2 )B(t 3 )\B(ti), B(t 2 )]\ 

= E[B(fi)B(h)E[B(t3)\B(ti), B(t 2 )]] 

= E[B (t\) B (t 2 ) B (t 2 )] 

= E[E[B(ti)B 2 (t 2 )\B(ti)]] 

= E[B(h)E[B 2 (t 2 )\B(tx)]] 

= E[B(t l ){(t 2 -t 1 ) + B 2 (t l )}] (*) 

= E[B\t 1 )i + (t 2 -t 1 )E[B(t 1 )] 

= 0 


where the equality (*) follows since given B(t \), B(t 2 ) is normal with mean B(t\) 
and variance t 2 — t\. Also, E[B^(t)] = 0 since Bit) is normal with mean 0. 

5. P{T\ < 7!_i < T 2 } = Z 3 {hit 1 before —1 before 2} 

= / J {h i t 1 before —1} 

xP{hit —1 before 2|hit 1 before —1} 

= {down 2 before up 1} 

111 
~ 23 “ 6 


The next to last equality follows by looking at the Brownian motion when it first 
hits 1. 

10. (a) Writing X(t) = X (s) + X it) — X is ) and using independent increments, we 

see that given X(s) = c, X(t) is distributed as c + X{t) — X(s). By stationary 
increments this has the same distribution as c + X(t — s ), and is thus normal 
with mean c + /r(f — s ) and variance (t — s)cj 2 . 

(b) Use the representation X(t) = a Bit) + Bt, where {B(t)} is standard Brow¬ 
nian motion. Using Equation (10.4), but reversing s and t, we see that the 
conditional distribution of B(t ) given that Bis) = (c — /is)/a is normal with 
mean t (c— /us)/(as) and variance t ( 5 — t)/s. Thus, the conditional distribution 
of X(t) given that X(s) = c, s > f, is normal with mean 


t(c — [J,S) 


(c — fu,s)t 

■ fit = -b Alt 


and variance 


ct 2 t(s - t) 


s 
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19. Since knowing the value of Y (?) is equivalent to knowing Bit), we have 

E[Y(t)\Y(u),0 ^ u^s] = e~ c2 ‘ /2 E[e cB(,) \B(u),0^ u < s] 

= e- c2,/2 E[e cm \B(s)] 

Now, given B(s), the conditional distribution of B(t) is normal with mean Bis) and 
variance t — s. Using the formula for the moment generating function of a normal 
random variable we see that 

e~ c2t/2 E[e cB(t) \B(s)] = e -c 2 ‘/2 e cB(s)+(t-s)c 2 /2 

= e - c2s / 2 e cB (i) 

= r( 5 ) 


Thus {Y(t)} is a Martingale. 

E[Y(t)i = £[T(0)] = 1 
20. By the Martingale stopping theorem 
E[B(T)] = E[fl(0)] = 0 

However, B(T) =2 — AT and so 2 — 4 E[T] = 0, or E[T] = j. 

24. It follows from the Martingale stopping theorem and the result of Exercise 18 that 

E[B 2 (T) - T] = 0 


where T is the stopping time given in this problem and 

X(t) — /j.t 
Bit) = - 

a 

Therefore, 

■ {X{T)-nT) 2 1 q 

a 2 

However, X(T) = x and so the preceding gives that 
E[{x - a iT) 2 ] = <j 2 E[T] 


But, from Exercise 21, E[T] = x /// and so the preceding is equivalent to 
Var(/xT) = o 2 — or Var(T) = a 2 —= 

25. Let Xa (t) be the value of the process at time t. With /,• = 1 if the /th change is an 
increase and — 1 if it is a decrease, then 


_ \t/ A] 

X A (t) = a Va 

i'=l 
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Because the /, are independent it is clear that this process has independent incre¬ 
ments and that the limiting process (as A —> 0) will have stationary increments. 
Also, by the central limit theorem, it is clear that the limiting distribution of 
X/\(t jwill be normal. The result now follows because 

E[X A (t)] = crv / A[r/A](2p- 1) 

= p A [t / A] —► pt 


and 

Var(X A (0) = a 2 A[r/A](l - ( 2 p - l) 2 ) 


where the preceding used that p —> 1/2 as A -> 0. 

26. With M(t ) = maxo^y^f X(y), 

P(T y < oo) = lim P(M(t) ^ y) 

and the result follows from Corollary 10.1 since lim J _ >00 <J>(i) = 0 and lim v ^.oc. 3> 
(—s) = 1. Hence, for p < 0, 

i P(M(t) > y) = e 2yil/r2 

t 


P(M > y) = lim P(M(t ) > y) = e 2y ^° 2 . 


27. Using that {— X(y), y 0} is Brownian motion with drift parameter — p and 
variance parameter a 2 , we obtain from Corollary 10.1 that for s < 0 


P( min X(y) ^ s) = P( max —X(y) ^ —s) 


_ £ 2sn/a- (j, 


-pt — S 
CTs/t 

30. E[X(a 2 t)/a\ = (1 /a)E[X(a 2 t)] = 0. Fori < t, 

Cov(F(s), Y(t)) — -jCov(X(a 2 s), X(a 2 t )) 


O 


-s + pt 
a s/1 


Because {7(f)} is clearly Gaussian, the result follows. 


33. (a) Starting at any time t the continuation of the Poisson process remains a Poisson 

process with rate X. ^ 

(b) E[Y(t)Y(t + i)] = / E[Y(t)Y(t + s)\Y(t) = y]Xe~ Xy dy 

Jo 

= f yE[Y(t + s)\Y(t) = y]Xe- Xy dy 


L 

/ oo 

y(y — s)Xe~ Xy dy 

I s i r°° 

= J y / ke ~ Xydy + J y(y-s)X e 


dy 





Solutions to Starred Exercises 


755 


where the preceding used that 


E[Y{t)Y{t + S )\Y{t) = y]=\ yE(y{t + S)) X' lfy< * 

l^Cy-s), Ify>j 

Hence, 

r s r°° 1 

Co v(Y(t),Y(t + s)) = J ye^dy + J y(y - s)Xe~ Xy dy - 

Chapter 11 

1. (a) Let C be a random number. If Pj < U ^ Hj=i Pj then simulate from 

Fj . (In the preceding Yl'hJi Pj — 0 when i = 1 .) 

(b) Note that 


fW = ^lW + ^2(l) 

where 


F\ (x ) = 1 — e 
Fi{x) = \*' 


2 x , 0 < x < oo 

0 < x < 1 
1 < x 


Hence, using part (a), let Ci, Ui, C 3 be random numbers and set 


X = 


- log U 2 
2 

C/3, 


if Cl < j 
if Cl > j 


The preceding uses the fact that — log C 2/2 is exponential with rate 2. 

3. If a random sample of size n is chosen from a set of /V + M items of which N are 
acceptable, then X, the number of acceptable items in the sample, is such that 


P{X = k } 



To simulate X, note that if 




1 , if the jth selection is acceptable 
0 , otherwise 


then 


N + M-(J- 1) 


P{Ij = l\h,...,Ij-l} = 
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Hence, we can simulate I\,... ,I n by generating random numbers U \ . U n and 

then setting 


h 


1, 

o, 


if Uj < 


N + M-U- 1) 


otherwise 


and X — Yl'j=i Ij h as the desired distribution. 
Another way is to let 


Xj = 


1 , 

0, 


the / th acceptable item is in the sample 
otherwise 


and then simulate Xj, ..., by generating random numbers U\, ..., f/,y and 
then setting 


*j = 


1, 

0, 


if Uj < 


N + M-(j- 1) 


otherwise 


and X = | X j then has the desired distribution. 

The former method is preferable when n ^ N and the latter when N ^ n. 

6. Let 


c(X) = max 

X 


m l 

Xe~ Xx J 


Xy/27T 


exp 



2 

— r= max 
X^Tht x 



Hence, 


d 

dX 


c(X ) = yjl/n exp 




1 

37 


Hence (d/dX)c(X) — 0 when 1=1 and it is easy to check that this yields the 
minimal value of c(X). 

16. (a) They can be simulated in the same sequential fashion in which they are defined. 

That is, first generate the value of a random variable I\ such that 

Wi 

P{I l= i}= -, i = n 

£;=l w .i 

Then, if I\ = k, generate the value of h where 

Wi 

P{h = '} = -, i 7 k 

w j 

and so on. However, the approach given in part (b) is more efficient. 
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(b) Let I j denote the index of the /th smallest X,. 

23. Let m(t) = Jq X(5)t/5,andlet7« _1 (t)betheinversefunction.Thatis,m(m _1 (t)) = t. 

^ P{m(X i) > x) — P{X i > 

= P{N(m~ l (x)) = 0} 

_ g -m(m _1 (x)) 

= e~ x 

( b ) P[m(Xi) - m(Xi-i) > x\m(X\), -i) - w(X,_ 2 )} 

= />{m(Xi) - m(Xi- 1) > x|Xi,..., X,_,} 

= - wCX^i ) > x\X[—\} 

= P{m(Xi) — m(Xj-i) > x|m(X,_i)} 

Now, 


P{m(Xi) - m(Xj-\) > x\ X,-_! = y} 

r x ‘ 

/ X(t)dt > x|X/_i = y 
J V 


= Z 5 


= L’{X/ > c|X,_i = y} where f X(t)dt — x 

J y 

= P{N(c) - N(y) = 0|Xi_i = y} 

= P{N(c) - N(y) = 0} 

= exp | — X(t) t/r | 


32. Var[(X + L)/2] = ±[Var(X) + Var(L) + 2Cov(X, Y)] 
Var(X) + Cov(X, Y) 


Now it is always true that 
Cov(V, W) 


£ 1 


VVar(y)Var(W) 

and so when X and Y have the same distribution Cov(X, Y) X- Var( X). 
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Absorbing states, 185-186 
Algorithmic efficiency, model for, 223 
Aloha protocol, instability of, 201-202 
Alternating renewal process, 439 
Aperiodic chain, 219-220 
Arbitrage, 615 
Arbitrage theorem, 616 
Arbitrary queueing system, 520 
Arrival theorem, 516-517 
Autoregressive process, 635 

B 

Backward approach, 257-258 
Balance equations, 375, 487-488 
Bayes’ formula, 11 

Bernoulli probability mass function, 691 
Bernoulli random variables, 26, 50-51, 

63-64, 281-282, 323, 365-366, 453- 
454, 595-596, 665, 691, 693 
expectation of, 35 
independent, 118 
Best prize problem, 118-119 
Beta density with parameters, 56-57 
Beta distribution, 661 
Binomial compounding distribution, 161 
Binomial distribution with parameters, 59 
Binomial probabilities, 199, 240 
Binomial random variables, 

26, 95, 665 
expectation of, 35-36 
variance of, 50-51 
Birth and death model, 357 
Birth and death processes, 364—365, 

359-360 

forward equations for, 373 
limiting probabilities for, 375 
Birth and death queueing models, 499 
Black-Scholes option pricing formula, 619 
Bonus Malus automobile insurance system, 
186, 218-219 


Boolean formula, 230 
Boolean variables, 230 
Bose-Einstein statistics, 141 
Box-Muller approach, 658-659 
Bridge system, 564 
Brownian bridge, 630-631 
Brownian motion, 607, 752 
Gaussian processes, 630 
independent increment assumption, 
609-610 

integrated, 631-632 
interpretation of, 609 
maximum variable and gambler’s ruin 
problem, 611 
pricing stock options 
arbitrage theorem, 616 
Black-Scholes option pricing 
formula, 619 

Brownian motion with drift, 624 
example in options pricing, 614 
process, 620 

with variance parameter, 622 
stationary processes, 633 
stochastic process, 608 
variations on 
with drift, 612 
geometric, 612 

weakly stationary processes, 637 
white noise, 628 

C 

Car buying model, 428 
Central limit theorem, 73-74, 225 
heuristic proof of, 76-77 
for renewal processes, 426 
Chapman-Kolmogorov equations, 187, 
195, 370, 372-373 
Chebyshev’s inequality, 72 
Chi-squared density, 98-99 
Chi-square distribution, 660 
definition of, 658 
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Chi-squared random variable, 66-68 
Class mobility model, 207 
Closed systems, network of queues, 514 
Common distribution function, 125-126 
Communication systems, 184 
Compounding distribution, 157-158 
Compound Poisson process, 327, 330-332 
examples of, 327 

Compound Poisson random variable, 113 
Compound random variables, 102 
identity for, 157 
variance of, 113 

Computer software package, 320 
Conditional density function, 333-334 
Conditional distribution 
of arrival times, 675 
function, 333-334 
Conditional expectation, 97 

probability and. See Conditional 
probability and expectation 
Conditional/mixed Poisson processes, 332 
Conditional Poisson process, 332-333 
Conditional probability, 6, 11 
density function, 97 
mass function, 93-95 
Conditional probability and expectation 
applications 

Bose-Einstein statistics, 141 
Urecord values of discrete random 
variables, 149 

left skip free random walks, 152 
list model, 133 
mean time for patterns, 146 
Polya’s Urn model, 141 
random graph, 135 
computing expectations by 
conditioning, 100 
variances, 111 
computing probabilities by 
conditioning, 115 
continuous case, 97 
discrete case, 93 
identity for compound random 
variables, 157 

binomial compounding distribution, 161 
compounding distribution related to 
negative binomial, 162 
Poisson compounding distribution, 160 
introduction, 93 


Conditional variance 
defined, 112 

formula, 111-112, 234-235, 281-282, 
329-330, 333, 365-366, 498-499, 
531-532, 731, 748-749 
Continuous case, 97 
random variables, 37 
Continuous-Markov chain, 732 
Continuous random variables, 24, 30, 32-33 
exponential random variables, 32 
gamma random variables, 33 
general techniques for 
hazard rate method, 654 
inverse transformation method, 649 
rejection method, 650 
increasing runs of, 461 
normal random variables, 33 
with probability density function, 39 
special techniques for 
beta distribution, 661 
chi-squared distribution, 660 
exponential distribution, 662 
gamma distribution, 660 
normal distribution, 657 
Von Neumann algorithm, 662 
uniform random variable, 31 
Continuous-time Markov chain, 357-358, 
371, 374, 380-381, 384-385, 
387-388, 393, 481, 688-689 
birth and death processes, 359 
limiting probabilities, 374 
reversed chain, 387 
time reversibility, 380 
transition probabilities of, 366, 396 
transition probability function, 366 
uniformization, 393 
Continuous-time process, 78 
Continuous-time stochastic process, 358 
Convolution of distributions, 52 
Cost equations, 482 
Coupon collecting problem, 306-307 
Covariance, properties of, 48 
Coxian random variable, 296-297 
Cumulative distribution function, 24—25, 

32, 42, 52 

of random variable, 32 
and probability density, relationship 
between, 31 
Cut vector, 564 
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D 

Data sequence, defined, 149-150 
Decreasing failure rate (DFR), 581 
Delayed renewal processes, 452-453, 470 
Density function, 691 

of gamma random variable, 115-116 
Desired probability, 222 
Dirichlet distribution, 143-144 
Disconnected graph, 135 
Discrete case, 93 
random variables, 34 
Discrete random variables, 24-25 
Bernoulli random variable, 26 
binomial random variable, 26 
distributed, 146 
geometric random variable, 28 
^-record values of, 149 
patterns of, 453 
Poisson random variable, 29 
with probability mass function, 39 
Discrete-time Markov chains, 380-383, 393 
Discrete-time process, 78 
Distributed discrete random variables, 146 
Distributed random variables, independent 
and identically, 424^425, 459 
Drift, Brownian motion with, 612 

E 

Elementary renewal theorem, 419, 

424^125, 462 
proof of, 422-423 
Embedded chain, 380 
Equilibrium distribution, 442 
Ergodic continuous-time Markov chain, 
387-388 

Ergodic Markov chain, 242 
Erlang’s loss formula, 482 
Erlang’s loss system, 542 
Event times, simulating, 676 
Exponential distribution, 662 
definition, 278 

exponential random variables, 
convolutions of, 293 
with parameter X, 60 
properties of, 277, 280, 287 
Exponential interarrival random 
variables, 493-494 
Exponential models 


birth and death queueing models, 499 
queueing system with bulk service, 507 
shoe shine shop, 505 
single-server exponential queueing 
system, 486 

with finite capacity, 495 
Exponential random variables, 32, 285-286, 
450, 492, 650 
convolutions of, 293 
expectation of, 37 

and expected discounted returns, 279 

F 

Failure rate function, 284-285, 

293, 296, 581 
of distribution, 585 
of hyperexponential random variable, 
285-286 

Feedback arrival, 513-514 
FIFO, 525 

Finite source model, 538 
Finite-state Markov chain, 197 
First-order autoregressive process, 635 
Five-component system, 562-563, 567 
Forward approach, 257-258 
Four-component structure, 561-562 
Fourier transforms, 638-639 
Front-of-the-line rule, 134, 244 
Fundamental queueing identity, 503-504 

G 

Gambler’s fortune, 156-157 
Gambler’s ruin problem, 220, 232, 611 
Gambling model, 185-186 
Gamma distribution, 115-116, 582, 660 
Gamma random variables, 33 
density function of, 115-116 
Gaussian processes, 630 

finite dimensional distributions of, 634 
Genetics, Markov chain in, 208 
Geometric Brownian motion, 612 
Geometric distribution, 665 
Geometric random variable, 28, 103, 
105-107, 151 
expectation of, 36 
variance of, 111-112 
Gibbs sampler, 249-250 
simulation, 519 
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GIMIk queue, 544 
GIMIl model, 534 
busy and idle periods, 538 
Graph 

connected, 241 

components, 139-140 
disconnected, 135 
random, 135 
Greedy algorithms, 288 

H 

Hardy-Weinberg law, 208 
Hastings-Metropolis algorithm, 247-250 
Hawkes processes, random intensity 
functions and, 334 
Hazard function, 585 

Hazard rate function. See Failure rate function 
Hazard rate method, 654 
Heuristic proof of equation, 483 
Hidden Markov chains, 254 
predicting states, 259 
Hitting time theorem, 154 
Hyperexponential random variable, 285-286 
Hypergeometric distribution, 51, 95 
Hypoexponential random variable, 293-296 

I 

Ignatov’s theorem, 125-126 
Impulse response function, 637 
Inclusion-exclusion bounds, 573 
Inclusion-exclusion identity, 6, 69-71 
Inclusion-exclusion theorem, 128 
Increasing failure rate (IFR) distribution, 

581, 586 

Increasing failure rate on the average (IFRA) 
distribution, 559 

Independent Bernoulli random variables, 74 
distribution of, 118 
Independent Bernoulli trials, 148 
Independent components, reliability 
systems of, 565 
Independent events, 9 
Independent geometries, 733 
Independent increments, 297, 752 
assumption, 297-298, 302, 609-610 
Independent random variables, 45, 

306-307, 653 
binomial, 62 
distributed, 424-425 


exponential, 99-100, 289-290, 306-307, 
539-540 

finite-valued, 150 
normal, 63, 67, 713 
Poisson, 53-54, 63, 306 
sequence of, 329-330 
standard normal, 67 
Independent time reversible 
continuous-time 
Markov chains, 386 
Independent with distribution 
function, 313 

Indicator random variable, 23 
Induction hypothesis, 137, 154-156 
Infinite server Poisson queue, output 
process of, 326 

Infinite server queue, 311, 441-442 
Instantaneous transition rates, 368, 384 
Insurance ruin problem, 462 
Integrated Brownian motion, 631-632 
Intensity function, nonhomogeneous Poisson 
process with, 322 
Interarrival density, 469 
Inverse transformation method, 649 
Irreducible Markov chain, 245 

J 

Joint cumulative probability distribution 
function, 42 

Joint density function of n random 
variables, 57 

Joint distribution functions, 42 
Jointly distributed random variables 
covariance and variance of, 46 
independent random variables, 45 
joint distribution functions, 42 
joint probability distribution of 
functions, 55 

Joint probability distributions, 43 
Joint probability mass function, 42 

K 

“Key renewal theorem,” 436 
Kolmogorov backward equations, 370, 
396-397 

Kolmogorov forward equations, 372-373 
k-out-of-n system, 560-561, 563 
with equal probabilities, 567 
with identical components, 

583-584 
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L 

Laplace transform, 64, 299, 322 
Limiting probabilities for Markov chain, 209 
Limit theorems, 71 

Linear birth rate, birth process with, 360 
Linear growth model with immigration, 361, 
376-377 

Linear program, 253-254 
Little’s formula, 483 

Long-run proportions, Markov chain, 204 

M 

Machine repair model, 377-378 
Markov chain, 183-184, 534, 537 
applications 

algorithmic efficiency, model for, 223 
gambler’s ruin problem, 220 
probabilistic algorithm for satisfiability 
problem, 226 
branching processes, 234 
Chapman-Kolmogorov equations, 187 
classification of states, 194 
defined, 191 
ergodic, 242 
finite-state, 197 
in genetics, 208 
hidden, 254 

predicting the states, 259 
irreducible, 245 

limiting probabilities for, 204, 209, 219 
long-run proportions and, 204 
Markov decision processes, 251 
mean pattern times in, 215 
mean time spent in transient states, 231 
Monte Carlo methods, 247 
stationary distribution of, 695 
stationary probabilities, 217, 539-540 
of successive states, 218-219 
time reversible, 237 
transforming process into, 185 
transition probabilities, 186, 212, 238, 544 
transition probability matrix, 189-190, 
195-196, 198, 206-207 
two-dimensional, 252-253 
two-state, 219-220 
Markov decision processes, 251 
Markovian property, 437 
Markov’s inequality, 72, 711 


Markov transition probability matrix, 
247-248, 515 
Martingale process, 623 
Martingale stopping theorem, 753 
Matching rounds problem, variance in, 
113-114 
Mean value 
analysis, 517 
function, 322 
MIGIk queue, 546 
MIGI 1 system 

application of, 520 
busy periods, 522 
optimization example, 527 
preliminaries, 520 
queue with server breakdown, 531 
variations on, 523 
Minimal cut set, 564 
Minimal cut vector, 564 
Minimal path set, 562 
Minimal path vector, 562 
Ml Ml 1 

queue, 363, 378, 487, 500 
with balking, 500 

with impatient customers, 502-503 
steady-state customer, 492 
MIMIk queue, 544 

Moment generating functions, 58, 713, 747 
formula for, 753 

joint distribution of mean and variance, 66 
number of events, distribution of, 69 
Monotone, 562 

system of independent components, 587 
Monte Carlo methods, 247 
Monte Carlo simulation, 247 
approach, 645-646 
Moving average process, 636 
m-step transition probability, 191 
Multinomial distribution, 104-105, 143-144 
Multinomial probability, 199-200 
Multiserver exponential queueing system, 
363, 376 

Multiserver queues, 542 
Multiserver systems, 482 
Multivariate normal distribution, 65, 67 

N 

NBU. See New better than used 
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Negative binomial distribution, 129-130, 
411, 715 

of noncentral chisquared random 
variable, 718 
Network of queues 
closed systems, 514 
open systems, 510 
New better than used (NBU), 457 
Nodes, 135 

Nonhomogeneous Poisson process, 322, 
334-335, 672, 685 
with intensity function, 731 
Nonnegative random variable, 591-592, 712 
Nonoverlapping patterns, 147-148 
Nonstationary Poisson process. See 

Nonhomogeneous Poisson process 
Normal distribution, 657 
with parameter /x, 60-61 
Normal random variables, 33, 652-653 
expectation of, 38 
variance of, 41 

n-step transition probabilities, 187-188 

O 

One-closer rule, 134, 243 
One-step transition probabilities, 187-188 
Open systems, network of queues, 510 
Order statistics, 54-55 
Ornstein-Uhlenbeck process, 635 

P 

Pairwise independent events, 10 
Parallel structure, 560 
Parallel system, 559, 566 
PASTA principle, 434, 486 
Path vector, 562 
Periodic chain, 219-220 
Poisson compounding distribution, 160 
Poisson distribution, 64, 212-215, 492 
with mean k, 59-60 
Poisson mean, 115-116 
Poisson paradigm, 63-64 
Poisson process, 277, 299-300, 308-309, 
320, 360, 409, 448-449, 

462^163, 470, 481, 486, 522, 527, 
623, 633, 646, 654-655. 661-662, 
666-667, 742 
assumption, 671-672 


conditional distribution of arrival 
times, 309 

counting processes, 297 
definition of, 298 
generalizations of, 322 
interarrival and waiting time 
distributions, 301 
nonhomogeneous, 672 
properties of, 303 
sampling, 311, 672 
software reliability, estimating, 320 
two-dimensional, 677 
Poisson random variables, 29, 95-96, 117, 
132, 139, 186-187, 212-213, 218- 
219, 323, 329, 331, 531, 533, 666, 
730, 737 

expectation of, 36-37 
variance of, 312-315 
Polar method, 660 
Pollaczek-Khintchine formula, 521 
Polya’s Urn model, 141 
Power spectral density, 638-639 
Preceding method, 149 
Present value, 614 
Pricing stock options 
arbitrage theorem, 616 
Black-Scholes option pricing formula, 619 
example in options pricing, 614 
Priority queueing systems, 524-525 
Priority queues, 524 
Probabilistic algorithm, 230 
Probability density 

function, 30-32, 125-126, 712 
relationship between cumulative 
distribution and, 31 

Probability mass function, 25, 29, 42-43, 
69-70, 125-126 
Probability theory, 1 
Bayes’ formula, 11 
conditional probabilities, 6 
independent events, 9 
probabilities defined on events, 4 
sample space and events, 1 
Production process, 439 
Pure birth process, 357 

Q 

Queueing cost identity, 489-490 
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Queueing models, fundamental quantities 
for, 482 

Queueing system, 684, 689 
with bulk service, 507 
Queueing theory 
exponential models 

birth and death queueing models, 499 
queueing system with bulk service, 507 
shoe shine shop, 505 
single-server exponential queueing 
system, 486, 495 
finite source model, 538 
G/M/l model, 534 
M/G/l system 

application of, 520 
busy periods, 522 
optimization example, 527 
preliminaries, 520 
queue with server breakdown, 531 
variations on, 523 
multiserver queues, 542 
network of queues 
closed systems, 514 
open systems, 510 
preliminaries 

cost equations, 482 
steady-state probabilities, 484 
Quick-sort algorithm, 108-110 

R 

Random graph, 135, 141, 575 
Random intensity functions, 334-335 
and Hawkes processes, 334 
Random numbers, 646 
Random permutation, generating, 646-648 
Random-sized batch arrivals, M/G/l 
with, 523 

Random telegraph signal process, 

633-634, 636 

Random variables, 21, 652-653, 675 
covariance and variance of sums of, 46 
density function, 652 
expectation of 

continuous case, 37 
discrete case, 34 
function of, 38 
expected value of, 40 
joint probability distribution of 
functions, 55 


Random walk model, 185, 198 
Random walk process, 424-425 
Rate-equality principle, 487-488, 495 
Rate of the distribution, 285 
Recursive equation, 121-122 
Rejection method, 650 
Reliability function, 565-567, 683 
bounds on, 570 

inclusion and exclusion, method 
of, 570 

obtaining bounds on r(p), method 
for, 578 

Reliability theory, 559 

independent components, reliability 
systems of, 565 

reliability function, bounds on, 570 
inclusion and exclusion method, 570 
obtaining bounds on r(p), 578 
structure functions, 560 

minimal path and minimal cut 
sets, 562 

system lifetime, expected, 587 

parallel system, upper bound on, 591 
systems with repair, 593 

suspended animation, series model 
with, 597 

Renewal arrivals, queueing system with, 437 
Renewal equation, 413-414 
Renewal function, 412-413 
computing, 449 

Renewal interarrival distribution, 450 
Renewal process, 409-410 
age of, 440 

average, 432-433 
average excess of, 433-434 
excess of, 441 

reward processes, 427, 430-431, 686 
Renewal reward theory, 432^134, 476 
Renewal theory and applications 
distribution of N (t), 411 
inspection paradox, 447 
introduction, 409 

limit theorems and applications, 415 
patterns, applications to 

continuous random variables, 461 
discrete random variables, 453 
distinct values, expected time to 
maximal run of, 459 
insurance ruin problem, 462 



766 


Index 


regenerative processes, 436 
alternating, 439 

renewal function, computing, 449 
renewal reward processes, 427 
semi-Markov processes, 444 
Reverse chain equations, 387, 389 
Reversed process, 382-383 
Reverse time equations, 390-392 

S 

Sample mean, 49, 625 
Sample variance, 66 
Satisfiability problem, 230 

probabilistic algorithm for, 226 
Second-order stationary process, 634, 636 
Self-exciting process, 335 
Semi-Markov processes, 444-445, 477, 742 
Sequence of interarrival times, 301-303 
Sequential queueing system, 391 
Series system, 559 
Simplex algorithm, 253-254 
Simulation, 645 

continuous random variables. See 
Continuous random variables 
determining number of runs, 694 
from discrete distributions, 664 
alias method, 667 
renewal function by, 686 
stationary distribution of Markov chain, 
coupling from past, 695, 697 
stochastic processes, 671 

nonhomogeneous Poisson process, 672 
two-dimensional Poisson process, 677 
of two-dimensional Poisson 
processes, 646 

variance reduction techniques, 680 
by conditioning, 684 
control variates, 688 
importance sampling, 690 
use of antithetic variables, 681 
Single-server exponential queueing system, 
486, 507-508 
finite capacity, 495 

Single-server Poisson arrival queues, 328 
Single-server queue, 694 
Single-server system, 482 
Sorting algorithm, 671 
Standard Brownian motion, 608, 632 
Standard normal distribution function, 426 


Standard normal random variables, 68, 654 
Standard/unit normal distribution, 34 
State space of stochastic process, 78 
State vector, 560 

Stationary distribution of Markov 
chain, 695 

Stationary ergodic Markov chain, 237 
Stationary increments, 298, 302, 752 
Stationary probabilities, 211-215, 375 
Stationary processes, 633 
weakly, 634 

Steady-state distribution, 544 
Steady-state probability, 253-254, 

484, 514 

Stirling’s approximation, 199-200, 203, 
225-226 

Stochastic processes, 77, 144-145, 

183, 436, 608 

Strong law of large numbers, 73 
Structure function, 560 
Symmetric random walk, 199 

T 

Tail distribution function, 293-296 
Tandem queue, 512 
Tandem/sequential system, 510 
Taylor series expansion, 76-77 
Time inventory, long-run proportion of, 
443-444 

Time reversibility, 380 
equations, 241-242, 736 
Time reversible chain, 385 
Time reversible equations, 384-385 
Time reversible Markov chain, 237 
T-random variable, 98-99 
Transition probabilities 
computing, 396 

of continuous-time Markov chain, 366 
defined, 237 
function, 366 
matrix, 724 

Two-dimensional Poisson 
process, 677 

Two independent uniform random variables, 
sum of, 53 

Two-state continuous-time Markov chain, 
394, 422, 424, 593 
Two-state Markov chain, 219-220 
Two-step transition matrix, 188-189 
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u 

Von Neumann algorithm, 662 

Unconditional probabilities, 194 

Uniform distribution, 143-144 
Uniformization, 393 

Uniformly distributed components, series 
system of, 588 

Uniform priors, 141 

Uniform random variable, 31, 712 
expectation of, 37 

W 

Wald’s equation, 419-420, 656, 
663-664, 739 

Weakly stationary processes, 634 
harmonic analysis of, 637 
Weibull distribution, 582 

White noise transformation, 628 
Wiener process, 608 

V 

Variance parameter, process with, 622 

Viterbi algorithm, 259 

Y 

Yule process, 360, 367-368 



